Evaluate and/or summarize prediction errors.
Source:R/evaluator-lib-prediction.R
eval_pred_err_funs.Rd
Evaluate various prediction error metrics, given the true
responses and the predicted (or estimated) responses.
eval_pred_err()
evaluates the various prediction error metrics for
each experimental replicate separately. summarize_pred_err()
summarizes the various prediction error metrics across experimental
replicates.
Usage
eval_pred_err(
fit_results,
vary_params = NULL,
nested_cols = NULL,
truth_col,
estimate_col,
prob_cols = NULL,
group_cols = NULL,
metrics = NULL,
na_rm = FALSE
)
summarize_pred_err(
fit_results,
vary_params = NULL,
nested_cols = NULL,
truth_col,
estimate_col,
prob_cols = NULL,
group_cols = NULL,
metrics = NULL,
na_rm = FALSE,
summary_funs = c("mean", "median", "min", "max", "sd", "raw"),
custom_summary_funs = NULL,
eval_id = "pred_err"
)
Arguments
- fit_results
A tibble, as returned by
fit_experiment()
.- vary_params
A vector of
DGP
orMethod
parameter names that are varied across in theExperiment
.- nested_cols
(Optional) A character string or vector specifying the name of the column(s) in
fit_results
that need to be unnested before evaluating results. Default isNULL
, meaning no columns infit_results
need to be unnested prior to computation.- truth_col
A character string identifying the column with the true responses. The column should be numeric for a regression problem and a factor for a classification problem.
- estimate_col
A character string identifying the column with the estimated or predicted responses. The column should be numeric for a regression problem and a factor (with the predicted classes) for a classification problem.
- prob_cols
A character string or vector identifying the column(s) containing class probabilities. If the
truth_col
column is binary, only 1 column name should be provided. Otherwise, the length of theprob_cols
should be equal to the number of factor levels of thetruth_col
column. This argument is not used when evaluating numeric metrics.- group_cols
(Optional) A character string or vector specifying the column(s) to group rows by before evaluating metrics. This is useful for assessing within-group metrics.
- metrics
A
metric_set
object indicating the metrics to evaluate. Seeyardstick::metric_set()
for more details. DefaultNULL
will use the default metrics inyardstick::metrics()
.- na_rm
A
logical
value indicating whetherNA
values should be stripped before the computation proceeds.- summary_funs
Character vector specifying how to summarize evaluation metrics. Must choose from a built-in library of summary functions - elements of the vector must be one of "mean", "median", "min", "max", "sd", "raw".
- custom_summary_funs
Named list of custom functions to summarize results. Names in the list should correspond to the name of the summary function. Values in the list should be a function that takes in one argument, that being the values of the evaluated metrics.
- eval_id
Character string. ID to be used as a suffix when naming result columns. Default
NULL
does not add any ID to the column names.
Value
The output of eval_pred_err()
is a tibble
with the following
columns:
- .rep
Replicate ID.
- .dgp_name
Name of DGP.
- .method_name
Name of Method.
- .metric
Name of the evaluation metric.
- .estimate
Value of the evaluation metric.
as well as any columns specified by group_cols
and vary_params
.
The output of summarize_pred_err()
is a grouped tibble
containing both identifying information and the prediction error results
aggregated over experimental replicates. Specifically, the identifier columns
include .dgp_name
, .method_name
, any columns specified by
group_cols
and vary_params
, and .metric
. In addition,
there are results columns corresponding to the requested statistics in
summary_funs
and custom_summary_funs
. These columns end in the
suffix specified by eval_id
.
See also
Other prediction_error_funs:
eval_pred_curve_funs
,
plot_pred_curve()
,
plot_pred_err()
Examples
############################
#### Regression Problem ####
############################
# generate example fit_results data for a regression problem
fit_results <- tibble::tibble(
.rep = rep(1:2, times = 2),
.dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
.method_name = c("Method"),
# true response
y = lapply(1:4, FUN = function(x) rnorm(100)),
# predicted response
predictions = lapply(1:4, FUN = function(x) rnorm(100)),
group = lapply(1:4, FUN = function(x) rep(c("a", "b"), length.out = 100))
)
# evaluate prediction error (using all default metrics) for each replicate
eval_results <- eval_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions")
# summarize prediction error (using all default metric) across replicates
eval_results_summary <- summarize_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions")
# evaluate/summarize prediction error within subgroups
eval_results <- eval_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions",
group_cols = "group")
eval_results_summary <- summarize_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions",
group_cols = "group")
# evaluate/summarize prediction errors using specific yardstick metrics
metrics <- yardstick::metric_set(yardstick::rmse, yardstick::rsq)
eval_results <- eval_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions",
metrics = metrics)
eval_results_summary <- summarize_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions",
metrics = metrics)
# summarize prediction errors using specific summary metric
range_fun <- function(x) return(max(x) - min(x))
eval_results_summary <- summarize_pred_err(
fit_results,
truth_col = "y",
estimate_col = "predictions",
custom_summary_funs = list(range_pred_err = range_fun)
)
#######################################
#### Binary Classification Problem ####
#######################################
# generate example fit_results data for a binary classification problem
fit_results <- tibble::tibble(
.rep = rep(1:2, times = 2),
.dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
.method_name = c("Method"),
# true response
y = lapply(1:4,
FUN = function(x) {
as.factor(sample(0:1, size = 100, replace = TRUE))
}),
# predicted class probabilities
class_probs = lapply(1:4, FUN = function(x) runif(n = 100, min = 0, max = 1)),
# predicted class responses
predictions = lapply(class_probs,
FUN = function(x) as.factor(ifelse(x > 0.5, 1, 0)))
)
# evaluate prediction error (using all default metrics) for each replicate
eval_results <- eval_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions",
prob_cols = "class_probs")
# summarize prediction error (using all default metric) across replicates
eval_results_summary <- summarize_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions",
prob_cols = "class_probs")
# can also evaluate results using only class predictions (without class probs.)
eval_results <- eval_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions")
eval_results_summary <- summarize_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions")
############################################
#### Multi-class Classification Problem ####
############################################
# generate example fit_results data for a multi-class classification problem
fit_results <- tibble::tibble(
.rep = rep(1:2, times = 2),
.dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
.method_name = c("Method"),
# true response
y = lapply(1:4,
FUN = function(x) {
as.factor(sample(c("a", "b", "c"), size = 100, replace = TRUE))
}),
# predicted class probabilities
class_probs = lapply(1:4,
FUN = function(x) {
tibble::tibble(a = runif(n = 100, min = 0, max = 0.5),
b = runif(n = 100, min = 0, max = 0.5),
c = 1 - a - b)
}),
# predicted class responses
predictions = lapply(class_probs,
FUN = function(x) {
yhat <- apply(x, 1,
FUN = function(xi) names(which.max(xi)))
return(as.factor(yhat))
})
)
# evaluate prediction error (using all default metrics) for each replicate
eval_results <- eval_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions",
prob_cols = c("a", "b", "c"),
nested_cols = c("y", "class_probs", "predictions"))
#' summarize prediction error (using all default metric) across replicates
eval_results_summary <- summarize_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions",
prob_cols = c("a", "b", "c"),
nested_cols = c("y", "class_probs", "predictions"))
# can also evaluate results using only class predictions (without class probs.)
eval_results <- eval_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions")
eval_results_summary <- summarize_pred_err(fit_results,
truth_col = "y",
estimate_col = "predictions")