Simulate linear response data.
generate_y_linear.Rd
Generate linear response data with a specified error distribution given the observed and unobserved design matrices.
Usage
generate_y_linear(
X,
U,
betas = NULL,
betas_unobs = NULL,
intercept = 0,
err = NULL,
return_support = FALSE,
...
)
Arguments
- X
Design data matrix of observed variables.
- U
Design data matrix of unobserved (omitted) variables.
- betas
Coefficient vector for observed design matrix. If a scalar is provided, the coefficient vector is constant. If
NULL
(default), entries in the coefficient vector are drawn iid from N(0,betas_sd
^2). Can also be a function that generates the coefficient vector; seegenerate_coef()
.- betas_unobs
Coefficient vector for unobserved design matrix. If a scalar is provided, the coefficient vector is constant. If
NULL
(default), entries in the coefficient vector are drawn iid from N(0,betas_unobs_sd
^2). Can also be a function that generates the coefficient vector; seegenerate_coef()
.- intercept
Scalar intercept term.
- err
Function from which to generate simulated error vector. Default is
NULL
which adds no error to the DGP.- return_support
Logical specifying whether or not to return a vector of the support column names. If
X
has no column names, then the indices of the support are used.- ...
Additional arguments to pass to functions that generate betas, betas_unobs, and err. If the argument doesn't exist in one of the functions it is ignored. If two or more of the functions have an argument of the same name but with different values, then use one of the following prefixes in front of the argument name (passed via
...
) to differentiate it: .betas_, .betas_unobs_, or .err_. For additional details, seegenerate_coef()
andgenerate_errors()
Value
If return_support = TRUE
, returns a list of two:
- y
A response vector of length
nrow(X)
.- support
A vector of feature indices indicating all features used in the true support of the DGP.
If return_support = FALSE
, returns only the response vector y
.
Examples
X <- generate_X_gaussian(.n = 100, .p = 2)
U <- generate_X_gaussian(.n = 100, .p = 2)
# generate the response from: y = 3*x_1 - x_2 + N(0, 1) errors
y <- generate_y_linear(X = X, betas = c(3, -1), err = rnorm)
# generate the response from: y = 3*x_1 - x_2 + u_1 + 2*u_2
y <- generate_y_linear(X = X, U = U, betas = c(3, -1), betas_unobs = c(1, 2))