Skip to contents

Generate a design matrix X by sampling from a real-world data matrix under the specified sampling scheme.

Usage

generate_X_rwd(X, n = nrow(X), p = ncol(X), clusters = NULL, replace = FALSE)

Arguments

X

Data matrix or data frame.

n

Number of samples if clusters is not NULL. If clusters = NULL, this is the number of clusters.

p

Number of features. If p < ncol(X), the p features are sampled uniformly at random from the full feature set.

clusters

(Optional) Vector of cluster IDs. If provided, block or clustered sampling will be performed according to these clusters so that each cluster will be entirely in or entirely out of the retrieved sample.

replace

Logical. If TRUE, sample observations with replacement; if FALSE, sample observations without replacement

Value

A matrix of size n x p.

Examples

# get bootstrap samples from iris data set
X <- generate_X_rwd(X = iris, replace = TRUE)

# leave one batch out from iris data set
batch_ids <- rep(1:3, length.out = nrow(iris))
X <- generate_X_rwd(X = iris, n = 2, clusters = batch_ids)