Skip to contents

Evaluate the ROC or PR curves, given the true responses and the predicted probabilities for each class. eval_pred_curve() evaluates the ROC or PR curve for each experimental replicate separately. summarize_pred_curve() summarizes the ROC or PR curve across experimental replicates.

Usage

eval_pred_curve(
  fit_results,
  vary_params = NULL,
  nested_cols = NULL,
  truth_col,
  prob_cols,
  group_cols = NULL,
  curve = c("ROC", "PR"),
  na_rm = FALSE
)

summarize_pred_curve(
  fit_results,
  vary_params = NULL,
  nested_cols = NULL,
  truth_col,
  prob_cols,
  group_cols = NULL,
  curve = c("ROC", "PR"),
  na_rm = FALSE,
  x_grid = seq(0, 1, by = 0.01),
  summary_funs = c("mean", "median", "min", "max", "sd", "raw"),
  custom_summary_funs = NULL,
  eval_id = ifelse(curve == "PR", "precision", "TPR")
)

Arguments

fit_results

A tibble, as returned by fit_experiment().

vary_params

A vector of DGP or Method parameter names that are varied across in the Experiment.

nested_cols

(Optional) A character string or vector specifying the name of the column(s) in fit_results that need to be unnested before evaluating results. Default is NULL, meaning no columns in fit_results need to be unnested prior to computation.

truth_col

A character string identifying the column with the true responses. The column should be numeric for a regression problem and a factor for a classification problem.

prob_cols

A character string or vector identifying the column(s) containing class probabilities. If the truth_col column is binary, only 1 column name should be provided. Otherwise, the length of the prob_cols should be equal to the number of factor levels of the truth_col column. This argument is not used when evaluating numeric metrics.

group_cols

(Optional) A character string or vector specifying the column(s) to group rows by before evaluating metrics. This is useful for assessing within-group metrics.

curve

Either "ROC" or "PR" indicating whether to evaluate the ROC or Precision-Recall curve.

na_rm

A logical value indicating whether NA values should be stripped before the computation proceeds.

x_grid

Vector of values between 0 and 1 at which to evaluate the ROC or PR curve. If curve = "ROC", the provided vector of values are the FPR values at which to evaluate the TPR, and if curve = "PR", the values are the recall values at which to evaluate the precision.

summary_funs

Character vector specifying how to summarize evaluation metrics. Must choose from a built-in library of summary functions - elements of the vector must be one of "mean", "median", "min", "max", "sd", "raw".

custom_summary_funs

Named list of custom functions to summarize results. Names in the list should correspond to the name of the summary function. Values in the list should be a function that takes in one argument, that being the values of the evaluated metrics.

eval_id

Character string. ID to be used as a suffix when naming result columns. Default NULL does not add any ID to the column names.

Value

The output of eval_pred_curve() is a tibble with the following columns:

.rep

Replicate ID.

.dgp_name

Name of DGP.

.method_name

Name of Method.

curve_estimate

A list of tibbles with x and y coordinate values for the ROC/PR curve for the given experimental replicate. If curve = "ROC", the tibble has the columns .threshold, FPR, and TPR for the threshold, false positive rate, and true positive rate, respectively. If curve = "PR", the tibble has the columns .threshold, recall, and precision.

as well as any columns specified by group_cols and vary_params.

The output of summarize_pred_curve() is a grouped tibble

containing both identifying information and the prediction curve results aggregated over experimental replicates. Specifically, the identifier columns include .dgp_name, .method_name, and any columns specified by group_cols and vary_params. In addition, there are results columns corresponding to the requested statistics in summary_funs and custom_summary_funs. If curve = "ROC", these results columns include FPR and others that end in the suffix "_TPR". If curve = "PR", the results columns include recall and others that end in the suffix "_precision".

See also

Other prediction_error_funs: eval_pred_err_funs, plot_pred_curve(), plot_pred_err()

Examples

#######################################
#### Binary Classification Problem ####
#######################################
# generate example fit_results data for a binary classification problem
fit_results <- tibble::tibble(
  .rep = rep(1:2, times = 2),
  .dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
  .method_name = c("Method"),
  # true response
  y = lapply(1:4,
             FUN = function(x) {
               as.factor(sample(0:1, size = 100, replace = TRUE))
             }),
  # predicted class probabilities
  class_probs = lapply(1:4, FUN = function(x) runif(n = 100, min = 0, max = 1))
)

# evaluate ROC/PR curve for each replicate
roc_results <- eval_pred_curve(fit_results, curve = "ROC",
                               truth_col = "y", prob_cols = "class_probs")
pr_results <- eval_pred_curve(fit_results, curve = "PR",
                              truth_col = "y", prob_cols = "class_probs")

# summarize ROC/PR curves across replicates
roc_summary <- summarize_pred_curve(fit_results, curve = "ROC",
                                    truth_col = "y", prob_cols = "class_probs")
pr_summary <- summarize_pred_curve(fit_results, curve = "PR",
                                   truth_col = "y", prob_cols = "class_probs")

############################################
#### Multi-class Classification Problem ####
############################################
# generate example fit_results data for a multi-class classification problem
fit_results <- tibble::tibble(
  .rep = rep(1:2, times = 2),
  .dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
  .method_name = c("Method"),
  # true response
  y = lapply(1:4,
             FUN = function(x) {
               as.factor(sample(c("a", "b", "c"), size = 100, replace = TRUE))
             }),
  # predicted class probabilities
  class_probs = lapply(1:4,
                       FUN = function(x) {
                         tibble::tibble(a = runif(n = 100, min = 0, max = 0.5),
                                        b = runif(n = 100, min = 0, max = 0.5),
                                        c = 1 - a - b)
                       })
)

# evaluate ROC/PR curve for each replicate
roc_results <- eval_pred_curve(fit_results, curve = "ROC",
                               nested_cols = c("y", "class_probs"),
                               truth_col = "y",
                               prob_cols = c("a", "b", "c"))
pr_results <- eval_pred_curve(fit_results, curve = "PR",
                              nested_cols = c("y", "class_probs"),
                              truth_col = "y",
                              prob_cols = c("a", "b", "c"))

# summarize ROC/PR curves across replicates
roc_summary <- summarize_pred_curve(fit_results, curve = "ROC",
                                    nested_cols = c("y", "class_probs"),
                                    truth_col = "y",
                                    prob_cols = c("a", "b", "c"))
pr_summary <- summarize_pred_curve(fit_results, curve = "PR",
                                   nested_cols = c("y", "class_probs"),
                                   truth_col = "y",
                                   prob_cols = c("a", "b", "c"))