Skip to contents

A helper function for developing new Evaluator functions that summarize results over pre-specified groups in a grouped data.frame (e.g., over multiple experimental replicates). This is often used in conjunction with eval_constructor().

Usage

eval_summarizer(
  eval_data,
  eval_id = NULL,
  value_col,
  summary_funs = c("mean", "median", "min", "max", "sd", "raw"),
  custom_summary_funs = NULL,
  na_rm = FALSE
)

Arguments

eval_data

A grouped data.frame of evaluation results to summarize.

eval_id

Character string. ID to be used as a suffix when naming result columns. Default NULL does not add any ID to the column names.

value_col

Character string. Name of column in eval_data with values to summarize.

summary_funs

Character vector specifying how to summarize evaluation metrics. Must choose from a built-in library of summary functions - elements of the vector must be one of "mean", "median", "min", "max", "sd", "raw".

custom_summary_funs

Named list of custom functions to summarize results. Names in the list should correspond to the name of the summary function. Values in the list should be a function that takes in one argument, that being the values of the evaluated metrics.

na_rm

A logical value indicating whether NA values should be stripped before the computation proceeds.

Value

A tibble containing the summarized results aggregated over the given groups. These columns correspond to the requested statistics in summary_funs and custom_summary_funs and end with the suffix specified by eval_id. Note that the group IDs are also retained in the returned tibble.

Examples

# create example eval_data to summarize
eval_data <- tibble::tibble(.rep = rep(1:2, times = 2),
                            .dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
                            .method_name = "Method",
                            result = 1:4) %>%
  dplyr::group_by(.dgp_name, .method_name)

# summarize `result` column in eval_data
results <- eval_summarizer(eval_data = eval_data, eval_id = "res",
                           value_col = "result")

# only compute mean and sd of `result` column in eval_data over given groups
results <- eval_summarizer(eval_data = eval_data, eval_id = "res",
                           value_col = "result",
                           summary_funs = c("mean", "sd"))

# summarize `results` column using custom summary function
range_fun <- function(x) return(max(x) - min(x))
results <- eval_summarizer(eval_data = eval_data, value_col = "result",
                           custom_summary_funs = list(range = range_fun))