Skip to contents

Evaluate various prediction error metrics, given the true responses and the predicted (or estimated) responses. eval_pred_err() evaluates the various prediction error metrics for each experimental replicate separately. summarize_pred_err() summarizes the various prediction error metrics across experimental replicates.

Usage

eval_pred_err(
  fit_results,
  vary_params = NULL,
  nested_data = NULL,
  truth_col,
  estimate_col,
  prob_cols = NULL,
  metrics = NULL,
  groups = NULL,
  options = list(),
  na_rm = FALSE
)

summarize_pred_err(
  fit_results,
  vary_params = NULL,
  nested_data = NULL,
  truth_col,
  estimate_col,
  prob_cols = NULL,
  metrics = NULL,
  groups = NULL,
  options = list(),
  na_rm = FALSE,
  summary_funs = c("mean", "median", "min", "max", "sd", "raw"),
  custom_summary_funs = NULL,
  eval_id = "pred_err"
)

Arguments

fit_results

A tibble, as returned by the fit method.

vary_params

A vector of parameter names that are varied across in the Experiment.

nested_data

(Optional) Character string. If specified, should be the name of the column in fit_results containing columns that must be unnested before evaluating results. Default is NULL, meaning no columns in fit_results need to be unnested prior to computation.

truth_col

A character string identifying the column with the true responses. The column should be numeric for a regression problem and a factor for a classification problem.

estimate_col

A character string identifying the column with the estimated or predicted responses. The column should be numeric for a regression problem and a factor (with the predicted classes) for a classification problem.

prob_cols

A character string or vector identifying the column(s) containing class probabilities. If the truth_col column is binary, only 1 column name should be provided. Otherwise, the length of the prob_cols should be equal to the number of factor levels of the truth_col column. This argument is not used when evaluating numeric metrics.

metrics

A metric_set object indicating the metrics to evaluate. See yardstick::metric_set() for more details. Default NULL will use the default metrics in yardstick::metrics().

groups

(Optional) vector of group IDs to group observations by before evaluating prediction errors. This is useful for assessing within-group prediction errors. Note: the (unstratified) prediction errors, aggregated across the full data set, are computed in addition to these stratified within-group errors.

options

A list of named options to pass to pROC::roc() such as smooth. These options should not include response, predictor, levels, quiet, or direction. This argument is only used when computing the ROC and is ignored otherwise.

na_rm

A logical value indicating whether NA values should be stripped before the computation proceeds.

summary_funs

Character vector specifying how to summarize evaluation metrics. Must choose from a built-in library of summary functions - elements of the vector must be one of "mean", "median", "min", "max", "sd", "raw".

custom_summary_funs

Named list of custom functions to summarize results. Names in the list should correspond to the name of the summary function. Values in the list should be a function that takes in one argument, that being the values of the evaluated metrics.

eval_id

Character string. ID to be used as a suffix when naming result columns. Default NULL does not add any ID to the column names.

Value

The output of eval_pred_err() is a tibble with the following columns:

.rep

Replicate ID.

.dgp_name

Name of DGP.

.method_name

Name of Method.

.group

If groups is not NULL, this column specifies the name of the group under evaluation. Otherwise, this column is not returned.

.metric

Name of the evaluation metric.

.estimate

Value of the evaluation metric.

as well as any columns specified by vary_params. The output of summarize_pred_err() is a grouped tibblecontaining both identifying information and the prediction error results aggregated over experimental replicates. Specifically, the identifier columns include .dgp_name, .method_name, any columns specified by vary_params, and .metric. In addition, there are results columns corresponding to the requested statistics in summary_funs and custom_summary_funs. These columns end in the suffix "_pred_err".

See also

Other prediction_error_funs: eval_pred_curve_funs, plot_pred_curve(), plot_pred_err()

Examples

############################
#### Regression Problem ####
############################

# generate example fit_results data for a regression problem
fit_results <- tibble::tibble(
  .rep = rep(1:2, times = 2),
  .dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
  .method_name = c("Method"),
  # true response
  y = lapply(1:4, FUN = function(x) rnorm(100)),
  # predicted response
  predictions = lapply(1:4, FUN = function(x) rnorm(100))
)

# evaluate prediction error (using all default metrics) for each replicate
eval_results <- eval_pred_err(fit_results, 
                              truth_col = "y", 
                              estimate_col = "predictions")
# summarize prediction error (using all default metric) across replicates
eval_results_summary <- summarize_pred_err(fit_results, 
                                           truth_col = "y",
                                           estimate_col = "predictions")

# evaluate/summarize prediction error within subgroups
group_ids <- rep(c("a", "b"), length.out = 100)
eval_results <- eval_pred_err(fit_results, 
                              truth_col = "y", 
                              estimate_col = "predictions",
                              groups = group_ids)
eval_results_summary <- summarize_pred_err(fit_results, 
                                           truth_col = "y",
                                           estimate_col = "predictions",
                                           groups = group_ids)

# evaluate/summarize prediction errors using specific yardstick metrics
metrics <- yardstick::metric_set(yardstick::rmse, yardstick::rsq)
eval_results <- eval_pred_err(fit_results,
                              truth_col = "y",
                              estimate_col = "predictions",
                              metrics = metrics)
eval_results_summary <- summarize_pred_err(fit_results,
                                           truth_col = "y",
                                           estimate_col = "predictions",
                                           metrics = metrics)

# summarize prediction errors using specific summary metric
range_fun <- function(x) return(max(x) - min(x))
eval_results_summary <- summarize_pred_err(
  fit_results,
  truth_col = "y",
  estimate_col = "predictions",
  custom_summary_funs = list(range_pred_err = range_fun)
)

#######################################
#### Binary Classification Problem ####
#######################################
# generate example fit_results data for a binary classification problem
fit_results <- tibble::tibble(
  .rep = rep(1:2, times = 2),
  .dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
  .method_name = c("Method"),
  # true response
  y = lapply(1:4,
             FUN = function(x) {
               as.factor(sample(0:1, size = 100, replace = TRUE))
             }),
  # predicted class probabilities 
  class_probs = lapply(1:4, FUN = function(x) runif(n = 100, min = 0, max = 1)),
  # predicted class responses
  predictions = lapply(class_probs, 
                       FUN = function(x) as.factor(ifelse(x > 0.5, 1, 0)))
)

# evaluate prediction error (using all default metrics) for each replicate
eval_results <- eval_pred_err(fit_results,
                              truth_col = "y",
                              estimate_col = "predictions", 
                              prob_cols = "class_probs")
# summarize prediction error (using all default metric) across replicates
eval_results_summary <- summarize_pred_err(fit_results,
                                           truth_col = "y",
                                           estimate_col = "predictions",
                                           prob_cols = "class_probs")

# can also evaluate results using only class predictions (without class probs.)
eval_results <- eval_pred_err(fit_results,
                              truth_col = "y",
                              estimate_col = "predictions")
eval_results_summary <- summarize_pred_err(fit_results,
                                           truth_col = "y",
                                           estimate_col = "predictions")

############################################
#### Multi-class Classification Problem ####
############################################
# generate example fit_results data for a multi-class classification problem
fit_results <- tibble::tibble(
  .rep = rep(1:2, times = 2),
  .dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
  .method_name = c("Method"),
  # true response
  y = lapply(1:4, 
             FUN = function(x) {
               as.factor(sample(c("a", "b", "c"), size = 100, replace = TRUE))
             }),
  # predicted class probabilities
  class_probs = lapply(1:4, 
                       FUN = function(x) {
                         tibble::tibble(a = runif(n = 100, min = 0, max = 0.5),
                                        b = runif(n = 100, min = 0, max = 0.5),
                                        c = 1 - a - b)
                       }),
  # predicted class responses
  predictions = lapply(class_probs,
                       FUN = function(x) {
                         yhat <- apply(x, 1, 
                                       FUN = function(xi) names(which.max(xi)))
                         return(as.factor(yhat))
                       })
)

# evaluate prediction error (using all default metrics) for each replicate
eval_results <- eval_pred_err(fit_results,
                              truth_col = "y",
                              estimate_col = "predictions", 
                              prob_cols = c("a", "b", "c"), 
                              nested_data = "class_probs")
#' summarize prediction error (using all default metric) across replicates
eval_results_summary <- summarize_pred_err(fit_results,
                                           truth_col = "y",
                                           estimate_col = "predictions",
                                           prob_cols = c("a", "b", "c"), 
                                           nested_data = "class_probs")

# can also evaluate results using only class predictions (without class probs.)
eval_results <- eval_pred_err(fit_results,
                              truth_col = "y",
                              estimate_col = "predictions")
eval_results_summary <- summarize_pred_err(fit_results,
                                           truth_col = "y",
                                           estimate_col = "predictions")