Skip to contents

Takes in a data-generating process (DGP), and induces some bias due to omitted variable(s). In other words, this function will generate a design matrix X and response vector y according to the inputted DGP function, but will return a partially missing design matrix, where some variable/feature columns have been omitted.

Usage

omitted_var_dgp(dgp_fun, omitted_vars = 1, ...)

Arguments

dgp_fun

A function that generates data according to some known data-generating process. This function should return an object of the same format as the output of return_DGP_output().

omitted_vars

A vector of indices or column names corresponding to columns in X that should be omitted.

...

Additional arguments to pass to dgp_fun().

Value

The returned object has the same format as the output of dgp_fun(), except that specified variables, given by omitted_vars, have been omitted from the X component and the support (if applicable).

Examples

# generate data from a linear gaussian DGP with the first variable missing
dgp_out <- omitted_var_dgp(dgp_fun = linear_gaussian_dgp,
                           n = 100, p_obs = 10, s_obs = 2,
                           omitted_vars = 1)
# or equivalently, (minus the difference in column names)
dgp_out <- linear_gaussian_dgp(n = 10, p_obs = 9, p_unobs = 1,
                               s_obs = 1, s_unobs = 1)