flowchart TD; W((W))-->d((d)); W((W))-->y((y)); d((d))-->|"τ0"|y((y)); linkStyle 0,1 stroke:black,stroke-width:2px linkStyle 1,2 stroke:black,stroke-width:2px
make_partially_linear_dataset_constant
make_partially_linear_dataset_constant(=1000,
n_obs=4.0,
ate=10,
n_confounders='make_plr_CCDDHNR2018',
dgp=None,
seed**doubleml_kwargs,
)
Simulate a data generating process from a partially linear model with a constant treatment effect (ATE only).
The outcome and treatment are both continuous.The dataset is generated using the make_plr_CCDDHNR2018
or make_plr_turrell2018
function from the doubleml
package.
The general form of the data generating process is:
\[ y_i= \tau_0 d_i + g(\mathbf{W_i})+\epsilon_i \] \[ d_i=f(\mathbf{W_i})+\eta_i \]
where \(y_i\) is the outcome, \(d_i\) is the treatment, \(\mathbf{W_i}\) are the confounders, \(\epsilon_i\) and \(\eta_i\) are the error terms, \(\tau_0\) is the ATE parameter, \(g\) is the outcome function, and \(f\) is the treatment function.
See the doubleml
documentation for more details on the specific functional forms of the data generating process.
As a DAG, the data generating process can be roughly represented as:
Parameters
Name | Type | Description | Default |
---|---|---|---|
n_obs | int | The number of observations to generate. | 1000 |
ate | float | The average treatment effect \(\tau_0\). | 4.0 |
n_confounders | int | The number of confounders \(\mathbf{W_i}\) to generate. | 10 |
dgp | str | The data generating process to use. Can be “make_plr_CCDDHNR20” or “make_plr_turrell2018”. | 'make_plr_CCDDHNR2018' |
seed | int | None | The seed to use for the random number generator. | None |
**doubleml_kwargs | Additional keyword arguments to pass to the data generating process. | {} |
Returns
Name | Type | Description |
---|---|---|
df | pandas.DataFrame | The generated dataset where y is the outcome, d is the treatment, and W are the confounders. |
true_cates | numpy.ndarray | The true conditional average treatment effects, which are all equal to the ATE here. |
true_ate | float | The true average treatment effect. |
Examples
from caml.extensions.synthetic_data import make_partially_linear_dataset_constant
= make_partially_linear_dataset_constant(n_obs=1000,
df, true_cates, true_ate =4.0,
ate=10,
n_confounders="make_plr_CCDDHNR2018",
dgp=1)
seed
print(f"True CATES: {true_cates[:5]}")
print(f"True ATE: {true_ate}")
print(df.head())
True CATES: [4. 4. 4. 4. 4.]
True ATE: 4.0
W1 W2 W3 W4 W5 W6 W7 \
0 -1.799808 -0.830362 -0.775800 -2.430475 -1.759428 -0.196538 -0.392579
1 -2.238925 -2.107779 -1.619264 -1.816121 -2.084809 -0.456936 0.118781
2 1.069028 1.616054 1.959420 1.398880 0.058545 0.370891 0.161045
3 0.497020 -0.399126 -0.019305 0.230080 0.640361 1.233185 0.906313
4 -1.749809 -0.315699 -0.283176 0.439451 0.819941 0.156514 0.059722
W8 W9 W10 y d
0 -0.827537 -0.735652 -1.127103 -6.074658 -1.843476
1 0.270647 0.199401 0.049088 -8.534573 -1.969429
2 0.118180 0.438721 0.280880 4.915427 0.935840
3 1.031123 -0.373092 0.442367 -0.037117 -0.209740
4 0.472781 0.030157 1.174463 -7.922597 -1.903480