Plots a summary of the feature distributions (either together or separately per feature) in the data. Only continuous (i.e., numeric) and categorical (i.e., character or factor) features are used for plotting.

plot_data_distribution(
  data,
  by_feature = NULL,
  plot_type = "auto",
  xlab = "Value",
  title = NULL,
  plot_heights = 1,
  theme_options = NULL,
  ...
)

Arguments

data

A data matrix, data frame, or vector.

by_feature

Logical. If TRUE, plots distributions for each feature separately. If FALSE, plots distribution of all features together. Default is TRUE if there are <10 features and FALSE otherwise.

plot_type

Type of plot. Default is "auto", which uses a kernel density plot for continuous features and a bar plot for categorical features. If not "auto", `plot_type` should be a list with two named elements: `continuous` and `categorical`. The `continuous` element must be one of "density", "histogram", and "boxplot" while the `categorical` element must be "bar" (with more options to come), indicating the type of plot to use for continuous and categorical features, respectively.

xlab

X-axis label.

title

Plot title.

plot_heights

(Optional) numeric vector of relative row heights of subplots. Only used if both continuous and categorical features are found in the data. For example, heights = c(2, 1) would make the first row twice as tall as the second row.

theme_options

(Optional) list of arguments to pass to vthemes::theme_vmodern().

...

Additional arguments to pass to ggplot2::geom_*().

Value

A ggplot object.