Given data X, filters out columns in X according to various data preprocessing/cleaning procedures. `filter_cols_by_var` reduces the number of features in the data by keeping those with the largest variance.

filter_cols_by_var(X, min_var = NULL, max_p = NULL)

Arguments

X

A data matrix or data frame.

min_var

(Optional) minimum variance threshold. All columns with variance lower than `min_var` are removed. If NULL (default), no variance threshold is applied.

max_p

(Optional) maximum number of features to keep. Only features with the top `max_p` highest variances are kept. If NULL (default), there is no limit on the maximum number of features to keep.

Value

A cleaned data matrix or data frame.