Evaluate and/or summarize feature importance scores.
Source:R/evaluator-lib-feature-selection.R
eval_feature_importance_funs.Rd
Evaluate the estimated feature importance scores against the
true feature support. eval_feature_importance
evaluates the
feature importances for each experimental replicate separately.
summarize_feature_importance
summarizes the feature importances
across experimental replicates.
Usage
eval_feature_importance(
fit_results,
vary_params = NULL,
nested_cols = NULL,
feature_col,
imp_col,
group_cols = NULL
)
summarize_feature_importance(
fit_results,
vary_params = NULL,
nested_cols = NULL,
feature_col,
imp_col,
group_cols = NULL,
na_rm = FALSE,
summary_funs = c("mean", "median", "min", "max", "sd", "raw"),
custom_summary_funs = NULL,
eval_id = "feature_importance"
)
Arguments
- fit_results
A tibble, as returned by
fit_experiment()
.- vary_params
A vector of
DGP
orMethod
parameter names that are varied across in theExperiment
.- nested_cols
(Optional) A character string or vector specifying the name of the column(s) in
fit_results
that need to be unnested before evaluating results. Default isNULL
, meaning no columns infit_results
need to be unnested prior to computation.- feature_col
A character string identifying the column in
fit_results
with the feature names or IDs.- imp_col
A character string identifying the column in
fit_results
with the estimated feature importance data. Each element in this column should be an array of lengthp
, wherep
is the number of features and the feature order aligns with that oftruth_col
. Elements in this array should be numeric where a higher magnitude indicates a more important feature.- group_cols
(Optional) A character string or vector specifying the column(s) to group rows by before evaluating metrics. This is useful for assessing within-group metrics.
- na_rm
A
logical
value indicating whetherNA
values should be stripped before the computation proceeds.- summary_funs
Character vector specifying how to summarize evaluation metrics. Must choose from a built-in library of summary functions - elements of the vector must be one of "mean", "median", "min", "max", "sd", "raw".
- custom_summary_funs
Named list of custom functions to summarize results. Names in the list should correspond to the name of the summary function. Values in the list should be a function that takes in one argument, that being the values of the evaluated metrics.
- eval_id
Character string. ID to be used as a suffix when naming result columns. Default
NULL
does not add any ID to the column names.
Value
The output of eval_feature_importance()
is a tibble
with
the columns .rep
, .dgp_name
, and .method_name
in
addition to the columns specified by group_cols
, vary_params
,
feature_col
, and imp_col
.
The output of summarize_feature_importance()
is a grouped
tibble
containing both identifying information and the feature
importance results aggregated over experimental replicates. Specifically, the
identifier columns include .dgp_name
, .method_name
, any columns
specified by group_cols
and vary_params
, and the column
specified by feature_col
. In addition, there are results columns
corresponding to the requested statistics in summary_funs
and
custom_summary_funs
. These columns end in the suffix
specified by eval_id
.
See also
Other feature_selection_funs:
eval_feature_selection_curve_funs
,
eval_feature_selection_err_funs
,
plot_feature_importance()
,
plot_feature_selection_curve()
,
plot_feature_selection_err()
Examples
# generate example fit_results data for a feature selection problem
fit_results <- tibble::tibble(
.rep = rep(1:2, times = 2),
.dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
.method_name = c("Method"),
feature_info = lapply(
1:4,
FUN = function(i) {
tibble::tibble(
# feature names
feature = c("featureA", "featureB", "featureC"),
# estimated feature importance scores
est_importance = c(10, runif(2, min = -2, max = 2))
)
}
)
)
# evaluate feature importances (using all default metrics) for each replicate
eval_results <- eval_feature_importance(
fit_results,
nested_cols = "feature_info",
feature_col = "feature",
imp_col = "est_importance"
)
# summarize feature importances (using all default metric) across replicates
eval_results_summary <- summarize_feature_importance(
fit_results,
nested_cols = "feature_info",
feature_col = "feature",
imp_col = "est_importance"
)