Generate data from a model with omitted variable bias.
omitted_var_dgp.Rd
Takes in a data-generating process (DGP), and induces some bias
due to omitted variable(s). In other words, this function will generate
a design matrix X
and response vector y
according to the inputted
DGP function, but will return a partially missing design matrix, where
some variable/feature columns have been omitted.
Arguments
- dgp_fun
A function that generates data according to some known data-generating process. This function should return an object of the same format as the output of
return_DGP_output()
.- omitted_vars
A vector of indices or column names corresponding to columns in X that should be omitted.
- ...
Additional arguments to pass to
dgp_fun()
.
Value
The returned object has the same format as the output of
dgp_fun()
, except that specified variables, given by omitted_vars
, have
been omitted from the X
component and the support
(if applicable).
Examples
# generate data from a linear gaussian DGP with the first variable missing
dgp_out <- omitted_var_dgp(dgp_fun = linear_gaussian_dgp,
n = 100, p_obs = 10, s_obs = 2,
omitted_vars = 1)
# or equivalently, (minus the difference in column names)
dgp_out <- linear_gaussian_dgp(n = 10, p_obs = 9, p_unobs = 1,
s_obs = 1, s_unobs = 1)