Evaluate various testing error metrics, given the true feature support and the estimated p-values at pre-specified significance level thresholds. eval_testing_err() evaluates the various testing error metrics for each experimental replicate separately. summarize_testing_err() summarizes the various testing error metrics across experimental replicates.

## Usage

eval_testing_err(
fit_results,
vary_params = NULL,
nested_data = NULL,
truth_col,
pval_col = NULL,
metrics = NULL,
alphas = 0.05,
na_rm = FALSE
)

summarize_testing_err(
fit_results,
vary_params = NULL,
nested_data = NULL,
truth_col,
pval_col = NULL,
metrics = NULL,
alphas = 0.05,
na_rm = FALSE,
summary_funs = c("mean", "median", "min", "max", "sd", "raw"),
custom_summary_funs = NULL,
eval_id = "testing_err"
)

## Arguments

fit_results

A tibble, as returned by the fit method.

vary_params

A vector of parameter names that are varied across in the Experiment.

nested_data

(Optional) Character string. If specified, should be the name of the column in fit_results containing columns that must be unnested before evaluating results. Default is NULL, meaning no columns in fit_results need to be unnested prior to computation.

truth_col

A character string identifying the column in fit_results with the true feature support data. Each element in this column should be an array of length p, where p is the number of features. Elements in this array should be binary with TRUE or 1 meaning the feature (corresponding to that slot) is in the support and FALSE or 0 meaning the feature is not in the support.

pval_col

A character string identifying the column in fit_results with the estimated p-values data. Each element in this column should be an array of length p, where p is the number of features and the feature order aligns with that of truth_col.

metrics

A metric_set object indicating the metrics to evaluate. See yardstick::metric_set() for more details. Default NULL will evaluate the following: number of true positives (tp), number of false positives (fp), sensitivity (sens), specificity (spec), positive predictive value (ppv), number of tests that were rejected (pos), number of tests that were not rejected (neg), AUROC (roc_auc), and AUPRC (pr_auc).

alphas

Vector of significance levels at which to evaluate the various metrics. Default is alphas = 0.05.

na_rm

A logical value indicating whether NA values should be stripped before the computation proceeds.

summary_funs

Character vector specifying how to summarize evaluation metrics. Must choose from a built-in library of summary functions - elements of the vector must be one of "mean", "median", "min", "max", "sd", "raw".

custom_summary_funs

Named list of custom functions to summarize results. Names in the list should correspond to the name of the summary function. Values in the list should be a function that takes in one argument, that being the values of the evaluated metrics.

eval_id

Character string. ID to be used as a suffix when naming result columns. Default NULL does not add any ID to the column names.

## Value

The output of eval_testing_err() is a tibble with the following columns:

.rep

Replicate ID.

.dgp_name

Name of DGP.

.method_name

Name of Method.

.alpha

Level of significance.

.metric

Name of the evaluation metric.

.estimate

Value of the evaluation metric.

as well as any columns specified by vary_params. The output of summarize_testing_err() is a grouped tibblecontaining both identifying information and the evaluation results aggregated over experimental replicates. Specifically, the identifier columns include .dgp_name, .method_name, any columns specified by vary_params, and .metric. In addition, there are results columns corresponding to the requested statistics in summary_funs and custom_summary_funs. These columns end in the suffix "_testing_err".

Other inference_funs: eval_reject_prob(), eval_testing_curve_funs, plot_reject_prob(), plot_testing_curve(), plot_testing_err()

## Examples

# generate example fit_results data for an inference problem
fit_results <- tibble::tibble(
.rep = rep(1:2, times = 2),
.dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
.method_name = c("Method"),
feature_info = lapply(
1:4,
FUN = function(i) {
tibble::tibble(
# feature names
feature = c("featureA", "featureB", "featureC"),
# true feature support
true_support = c(TRUE, FALSE, TRUE),
# estimated p-values
pval = 10^(sample(-3:0, 3, replace = TRUE))
)
}
)
)

# evaluate feature selection (using all default metrics and alpha = 0.05) for each replicate
eval_results <- eval_testing_err(
fit_results,
nested_data = "feature_info",
truth_col = "true_support",
pval_col = "pval"
)
# summarize feature selection error (using all default metric and alpha = 0.05) across replicates
eval_results_summary <- summarize_testing_err(
fit_results,
nested_data = "feature_info",
truth_col = "true_support",
pval_col = "pval"
)

# evaluate/summarize feature selection (at alpha = 0.05) using specific yardstick metrics
metrics <- yardstick::metric_set(yardstick::sens, yardstick::spec)
eval_results <- eval_testing_err(
fit_results,
nested_data = "feature_info",
truth_col = "true_support",
pval_col = "pval",
metrics = metrics
)
eval_results_summary <- summarize_testing_err(
fit_results,
nested_data = "feature_info",
truth_col = "true_support",
pval_col = "pval",
metrics = metrics
)

# can evaluate/summarize feature selection at multiple values of alpha
eval_results <- eval_testing_err(
fit_results,
nested_data = "feature_info",
truth_col = "true_support",
pval_col = "pval",
alphas = c(0.05, 0.1)
)
eval_results_summary <- summarize_testing_err(
fit_results,
nested_data = "feature_info",
truth_col = "true_support",
pval_col = "pval",
alphas = c(0.05, 0.1)
)

# summarize feature selection (at alpha = 0.05) using specific summary metric
range_fun <- function(x) return(max(x) - min(x))
eval_results_summary <- summarize_testing_err(
fit_results,
nested_data = "feature_info",
truth_col = "true_support",
pval_col = "pval",
custom_summary_funs = list(range_testing_err = range_fun)
)