Evaluate and/or summarize error metrics when conducting multiple hypothesis tests.
Source:R/evaluator-lib-inference.R
eval_testing_err_funs.Rd
Evaluate various testing error metrics, given the true feature
support and the estimated p-values at pre-specified significance level
thresholds. eval_testing_err()
evaluates the various testing error
metrics for each experimental replicate separately.
summarize_testing_err()
summarizes the various testing error metrics
across experimental replicates.
Usage
eval_testing_err(
fit_results,
vary_params = NULL,
nested_cols = NULL,
truth_col,
pval_col = NULL,
group_cols = NULL,
metrics = NULL,
alphas = 0.05,
na_rm = FALSE
)
summarize_testing_err(
fit_results,
vary_params = NULL,
nested_cols = NULL,
truth_col,
pval_col = NULL,
group_cols = NULL,
metrics = NULL,
alphas = 0.05,
na_rm = FALSE,
summary_funs = c("mean", "median", "min", "max", "sd", "raw"),
custom_summary_funs = NULL,
eval_id = "testing_err"
)
Arguments
- fit_results
A tibble, as returned by
fit_experiment()
.- vary_params
A vector of
DGP
orMethod
parameter names that are varied across in theExperiment
.- nested_cols
(Optional) A character string or vector specifying the name of the column(s) in
fit_results
that need to be unnested before evaluating results. Default isNULL
, meaning no columns infit_results
need to be unnested prior to computation.- truth_col
A character string identifying the column in
fit_results
with the true feature support data. Each element in this column should be an array of lengthp
, wherep
is the number of features. Elements in this array should be binary withTRUE
or1
meaning the feature (corresponding to that slot) is in the support andFALSE
or0
meaning the feature is not in the support.- pval_col
A character string identifying the column in
fit_results
with the estimated p-values data. Each element in this column should be an array of lengthp
, wherep
is the number of features and the feature order aligns with that oftruth_col
.- group_cols
(Optional) A character string or vector specifying the column(s) to group rows by before evaluating metrics. This is useful for assessing within-group metrics.
- metrics
A
metric_set
object indicating the metrics to evaluate. Seeyardstick::metric_set()
for more details. DefaultNULL
will evaluate the following: number of true positives (tp
), number of false positives (fp
), sensitivity (sens
), specificity (spec
), positive predictive value (ppv
), number of tests that were rejected (pos
), number of tests that were not rejected (neg
), AUROC (roc_auc
), and AUPRC (pr_auc
).- alphas
Vector of significance levels at which to evaluate the various metrics. Default is
alphas = 0.05
.- na_rm
A
logical
value indicating whetherNA
values should be stripped before the computation proceeds.- summary_funs
Character vector specifying how to summarize evaluation metrics. Must choose from a built-in library of summary functions - elements of the vector must be one of "mean", "median", "min", "max", "sd", "raw".
- custom_summary_funs
Named list of custom functions to summarize results. Names in the list should correspond to the name of the summary function. Values in the list should be a function that takes in one argument, that being the values of the evaluated metrics.
- eval_id
Character string. ID to be used as a suffix when naming result columns. Default
NULL
does not add any ID to the column names.
Value
The output of eval_testing_err()
is a tibble
with the following
columns:
- .rep
Replicate ID.
- .dgp_name
Name of DGP.
- .method_name
Name of Method.
- .alpha
Level of significance.
- .metric
Name of the evaluation metric.
- .estimate
Value of the evaluation metric.
as well as any columns specified by group_cols
and vary_params
.
The output of summarize_testing_err()
is a grouped tibble
containing both identifying information and the evaluation results
aggregated over experimental replicates. Specifically, the identifier columns
include .dgp_name
, .method_name
, any columns specified by
group_cols
and vary_params
, and .metric
. In addition,
there are results columns corresponding to the requested statistics in
summary_funs
and custom_summary_funs
. These columns end in the
suffix specified by eval_id
.
See also
Other inference_funs:
eval_reject_prob()
,
eval_testing_curve_funs
,
plot_reject_prob()
,
plot_testing_curve()
,
plot_testing_err()
Examples
# generate example fit_results data for an inference problem
fit_results <- tibble::tibble(
.rep = rep(1:2, times = 2),
.dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
.method_name = c("Method"),
feature_info = lapply(
1:4,
FUN = function(i) {
tibble::tibble(
# feature names
feature = c("featureA", "featureB", "featureC"),
# true feature support
true_support = c(TRUE, FALSE, TRUE),
# estimated p-values
pval = 10^(sample(-3:0, 3, replace = TRUE))
)
}
)
)
# evaluate feature selection (using all default metrics and alpha = 0.05) for each replicate
eval_results <- eval_testing_err(
fit_results,
nested_cols = "feature_info",
truth_col = "true_support",
pval_col = "pval"
)
# summarize feature selection error (using all default metric and alpha = 0.05) across replicates
eval_results_summary <- summarize_testing_err(
fit_results,
nested_cols = "feature_info",
truth_col = "true_support",
pval_col = "pval"
)
# evaluate/summarize feature selection (at alpha = 0.05) using specific yardstick metrics
metrics <- yardstick::metric_set(yardstick::sens, yardstick::spec)
eval_results <- eval_testing_err(
fit_results,
nested_cols = "feature_info",
truth_col = "true_support",
pval_col = "pval",
metrics = metrics
)
eval_results_summary <- summarize_testing_err(
fit_results,
nested_cols = "feature_info",
truth_col = "true_support",
pval_col = "pval",
metrics = metrics
)
# can evaluate/summarize feature selection at multiple values of alpha
eval_results <- eval_testing_err(
fit_results,
nested_cols = "feature_info",
truth_col = "true_support",
pval_col = "pval",
alphas = c(0.05, 0.1)
)
eval_results_summary <- summarize_testing_err(
fit_results,
nested_cols = "feature_info",
truth_col = "true_support",
pval_col = "pval",
alphas = c(0.05, 0.1)
)
# summarize feature selection (at alpha = 0.05) using specific summary metric
range_fun <- function(x) return(max(x) - min(x))
eval_results_summary <- summarize_testing_err(
fit_results,
nested_cols = "feature_info",
truth_col = "true_support",
pval_col = "pval",
custom_summary_funs = list(range_testing_err = range_fun)
)