Provides a summary of the given (X, y) data in table form. Serves as a wrapper function around skimr::skim(), which skims a data frame and returns a broad overview of useful summary statistics. This wrapper can currently handle columns of type "factor", "numeric", "character", "logical", "complex", "Date", and "POSIXct". All other column types are ignored.

get_data_summary(
  X,
  y = NULL,
  skim_out = NULL,
  digits = 2,
  sigfig = FALSE,
  features = NULL,
  max_features = 1000,
  html = knitr::is_html_output(),
  ...
)

Arguments

X

Data matrix or data frame.

y

Response vector.

skim_out

(Optional) cached output of `skimr::skim()`. Specify if the skim output has been pre-computed in order to reduce computation.

digits

Number of digits to display for numeric values

sigfig

Logical. If TRUE, digits refers to the number of significant figures. If FALSE, digits refers to the number of decimal places.

features

(Optional) vector of features to include in summary. Default (NULL) is to include all features.

max_features

(Optional) maximum number of features to include in summary. Only used if features = NULL. Default is 1000. If the number of features in X exceeds `max_features`, the features kept in the summary are chosen randomly.

html

Logical indicating whether or not the output is an html table or a latex table.

...

Additional arguments to pass to vthemes::pretty_DT() if html = TRUE or vthemes::pretty_kable() if html = FALSE.

Value

Returns an html table (i.e., the output of vthemes::pretty_DT()) or a latex table (i.e., the output of vthemes::pretty_kable()), containing a broad overview of summary statistics for each data column.