from caml.extensions.synthetic_data import CamlSyntheticDataGenerator
= CamlSyntheticDataGenerator(n_obs=10_000,
data =1,
n_cont_outcomes=1,
n_binary_treatments=2,
n_cont_confounders=2,
n_cont_modifiers=1,
n_confounding_modifiers="linear",
causal_model_functional_form=5,
n_nonlinear_transformations=2,
n_nonlinear_interactions=1) seed
CamlCATE API Usage
Here we’ll walk through an example of generating synthetic data, running CamlCATE, and visualizing results using the ground truth as reference.
Generate Synthetic Data
Here we’ll leverage the CamlSyntheticDataGenerator
class to generate a linear synthetic data generating process, with a binary treatment, continuous outcome, and a mix of confounding/mediating continuous covariates.
We can print our simulated data via:
data.df
W1_continuous | W2_continuous | X1_continuous | X2_continuous | T1_binary | Y1_continuous | |
---|---|---|---|---|---|---|
0 | 0.212951 | 2.427782 | 4.855579 | 1.899164 | 1 | -15.626184 |
1 | 15.593752 | 7.556136 | 4.087682 | -0.574265 | 1 | -6.393739 |
2 | 1.062978 | 3.644116 | 4.970670 | 0.091263 | 1 | -13.628719 |
3 | 0.334657 | 4.581727 | 3.524831 | 0.235195 | 1 | -9.546798 |
4 | 5.221081 | 3.886017 | 0.487610 | -0.677476 | 1 | 0.070117 |
... | ... | ... | ... | ... | ... | ... |
9995 | 0.408012 | 4.600472 | 1.411209 | 1.882209 | 1 | -3.247035 |
9996 | 2.639800 | 6.699876 | 0.001684 | 2.976786 | 1 | 2.582407 |
9997 | 1.151358 | 7.453297 | 1.133279 | 2.428612 | 1 | -0.491604 |
9998 | 1.073735 | 8.631265 | 1.431656 | 2.058397 | 1 | -0.964414 |
9999 | 0.662806 | 5.282546 | 0.214802 | 2.999912 | 1 | 1.087057 |
10000 rows × 6 columns
To inspect our true data generating process, we can call data.dgp
. Furthermore, we will have our true CATEs and ATEs at our disposal via data.cates
& data.ates
, respectively. We’ll use this as our source of truth for performance evaluation of our CATE estimator.
for t, df in data.dgp.items():
print(f"\nDGP for {t}:")
print(df)
DGP for T1_binary:
covariates params global_transformation
0 W1_continuous 1.003484 Sigmoid
1 W2_continuous 2.968150 Sigmoid
2 X1_continuous 1.551445 Sigmoid
DGP for Y1_continuous:
covariates params global_transformation
0 W1_continuous 0.431376 None
1 W2_continuous 0.287855 None
2 X1_continuous -2.663734 None
3 X2_continuous 1.842291 None
4 T1_binary -0.656965 None
5 int_T1_binary_X1_continuous -0.549627 None
6 int_T1_binary_X2_continuous -1.580467 None
data.cates
CATE_of_T1_binary_on_Y1_continuous | |
---|---|
0 | -6.327288 |
1 | -1.996057 |
2 | -3.533216 |
3 | -2.966024 |
4 | 0.145760 |
... | ... |
9995 | -4.407371 |
9996 | -5.362603 |
9997 | -5.118186 |
9998 | -4.697070 |
9999 | -5.516288 |
10000 rows × 1 columns
data.ates
Treatment | ATE | |
---|---|---|
0 | T1_binary_on_Y1_continuous | -3.593735 |
Running CamlCATE
Class Instantiation
We can instantiate and observe our CamlCATE object via:
💡 Tip:
W
can be leveraged if we want to use certain covariates only in our nuisance functions to control for confounding and not in the final CATE estimator. This can be useful if a confounder may be required to include, but for compliance reasons, we don’t want our CATE model to leverage this feature (e.g., gender). However, this will restrict our available CATE estimators to orthogonal learners, since metalearners necessarily include all covariates. If you don’t care aboutW
being in the final CATE estimator, pass it asX
, as done below.
from caml import CamlCATE
= CamlCATE(df=data.df,
caml_obj ="Y1_continuous",
Y="T1_binary",
T=[c for c in data.df.columns if 'X' in c]
X+ [c for c in data.df.columns if 'W' in c],
=True,
discrete_treatment=False,
discrete_outcome=1) verbose
[03/31/25 18:55:16] INFO Logging has been set up. logging.py:50
print(caml_obj)
================== CamlCATE Object ==================
Data Backend: pandas
No. of Observations: 10,000
Outcome Variable: Y1_continuous
Discrete Outcome: False
Treatment Variable: T1_binary
Discrete Treatment: True
Features/Confounders for Heterogeneity (X): ['X1_continuous', 'X2_continuous', 'W1_continuous', 'W2_continuous']
Features/Confounders as Controls (W): []
Random Seed: None
Nuisance Function AutoML
We can then obtain our nuisance functions / regression & propensity models via Flaml AutoML:
caml_obj.auto_nuisance_functions(={"time_budget": 30,
flaml_Y_kwargs"verbose":0,
"estimator_list":["rf", "extra_tree", "xgb_limitdepth"]},
={"time_budget": 30,
flaml_T_kwargs"verbose":0,
"estimator_list":["rf", "extra_tree", "xgb_limitdepth"]},
)
print(caml_obj.model_Y_X_W)
print(caml_obj.model_Y_X_W_T)
print(caml_obj.model_T_X_W)
ExtraTreesRegressor(max_features=0.8120734525770129, max_leaf_nodes=877,
n_estimators=344, n_jobs=-1, random_state=12032022)
ExtraTreesRegressor(max_features=0.8120734525770129, max_leaf_nodes=877,
n_estimators=344, n_jobs=-1, random_state=12032022)
ExtraTreesClassifier(max_features=0.48476586538023114, max_leaf_nodes=4,
n_estimators=5, n_jobs=-1, random_state=12032022)
Fit CATE Estimators
Now that we have obtained our first-stage models, we can fit our CATE estimators via:
📝 Note: The selected model defaults to the one with the highest RScore. All fitted models are still accessible via the
cate_estimators
attribute and if you want to change default estimator, you can runcaml_obj._validation_estimator = {different_model}
.
🚀Forthcoming: Additional scoring techniques & AutoML for CATE estimators is on our roadmap.
caml_obj.fit_validator(=[
cate_estimators"LinearDML",
"CausalForestDML",
"ForestDRLearner",
"LinearDRLearner",
"DomainAdaptationLearner",
"SLearner",
"TLearner",
"XLearner",
],=0.2,
validation_size=0.2,
test_size=-1,
n_jobs )
INFO Estimator RScores: {'LinearDML': 0.13605192393186216, 'CausalForestDML': cate.py:868 0.13401438074400696, 'ForestDRLearner': 0.13334836796315974, 'LinearDRLearner': 0.13348360299921525, 'DomainAdaptationLearner': 0.13902672426789076, 'SLearner': 0.13392781656169428, 'TLearner': 0.1286775854604263, 'XLearner': 0.1362410478945144}
caml_obj.validation_estimator
<econml.metalearners._metalearners.DomainAdaptationLearner at 0x7efd6ad9a470>
caml_obj.cate_estimators
[('LinearDML', <econml.dml.dml.LinearDML at 0x7efd6aff0280>),
('CausalForestDML',
<econml.dml.causal_forest.CausalForestDML at 0x7efd6aff6ad0>),
('ForestDRLearner', <econml.dr._drlearner.ForestDRLearner at 0x7efd6aff6e60>),
('LinearDRLearner', <econml.dr._drlearner.LinearDRLearner at 0x7efd6a9788b0>),
('DomainAdaptationLearner',
<econml.metalearners._metalearners.DomainAdaptationLearner at 0x7efd6ad9e740>),
('SLearner', <econml.metalearners._metalearners.SLearner at 0x7efd6ad9e3e0>),
('TLearner', <econml.metalearners._metalearners.TLearner at 0x7efd6ad9f790>),
('XLearner', <econml.metalearners._metalearners.XLearner at 0x7efd6ad9c6d0>)]
Validate model on test hold out set
Here we can validate our model on the test hold out set. Currently, this is only available for when continuous outcomes and binary treatments exist.
caml_obj.validate()
[03/31/25 18:57:17] INFO All validation results suggest that the model has found statistically cate.py:513 significant heterogeneity.
treatment blp_est blp_se blp_pval qini_est qini_se qini_pval autoc_est autoc_se autoc_pval cal_r_squared
0 1 0.621 0.074 0.0 0.551 0.064 0.0 1.474 0.153 0.0 0.773
Refit our selected model on the entire dataset
Now that we have selected our top performer and validated results on the test set, we can fit our final model on the entire dataset.
caml_obj.fit_final()
caml_obj.final_estimator
<econml.metalearners._metalearners.DomainAdaptationLearner at 0x7efdbcedaa70>
Validating Results with Ground Truth
First, we will obtain our predictions.
= caml_obj.predict() cate_predictions
Average Treatment Effect (ATE)
We’ll use the summarize()
method after obtaining our predictions above, where our the displayed mean represents our Average Treatment Effect (ATE).
caml_obj.summarize()
cate_predictions_0_1 | |
---|---|
count | 10000.000000 |
mean | -3.488162 |
std | 2.452214 |
min | -51.386351 |
25% | -5.075448 |
50% | -3.487215 |
75% | -1.615648 |
max | 7.114962 |
Now comparing this to our ground truth, we see the model performed well the true ATE:
data.ates
Treatment | ATE | |
---|---|---|
0 | T1_binary_on_Y1_continuous | -3.593735 |
Conditional Average Treatment Effect (CATE)
Now we want to see how the estimator performed in modeling the true CATEs.
First, we can simply compute the Precision in Estimating Heterogeneous Effects (PEHE), which is simply the Mean Squared Error (MSE):
from sklearn.metrics import mean_squared_error
= data.cates.iloc[:, 0]
true_cates mean_squared_error(true_cates,cate_predictions)
0.827839455681178
Not bad! Now let’s use some visualization techniques:
from caml.extensions.plots import cate_true_vs_estimated_plot
=true_cates, estimated_cates=cate_predictions) cate_true_vs_estimated_plot(true_cates
from caml.extensions.plots import cate_histogram_plot
=true_cates, estimated_cates=cate_predictions) cate_histogram_plot(true_cates
from caml.extensions.plots import cate_line_plot
=true_cates, estimated_cates=cate_predictions, window=20) cate_line_plot(true_cates
Overall, we can see the model performed remarkably well!~
Obtaining Model Objects & Artifacts for Production Systems
In many production settings, we will want to store our model, information on the features used, etc. We provide attributes that to pull key information (more to be added later as class evolves)
Grabbing final model object:
caml_obj.final_estimator
<econml.metalearners._metalearners.DomainAdaptationLearner at 0x7efdbcedaa70>
Grabbing input features:
caml_obj.input_names
{'feature_names': ['X1_continuous',
'X2_continuous',
'W1_continuous',
'W2_continuous'],
'output_names': 'Y1_continuous',
'treatment_names': 'T1_binary'}
Grabbing all fitted CATE estimators:
caml_obj.cate_estimators
[('LinearDML', <econml.dml.dml.LinearDML at 0x7efd6aff0280>),
('CausalForestDML',
<econml.dml.causal_forest.CausalForestDML at 0x7efd6aff6ad0>),
('ForestDRLearner', <econml.dr._drlearner.ForestDRLearner at 0x7efd6aff6e60>),
('LinearDRLearner', <econml.dr._drlearner.LinearDRLearner at 0x7efd6a9788b0>),
('DomainAdaptationLearner',
<econml.metalearners._metalearners.DomainAdaptationLearner at 0x7efd6ad9e740>),
('SLearner', <econml.metalearners._metalearners.SLearner at 0x7efd6ad9e3e0>),
('TLearner', <econml.metalearners._metalearners.TLearner at 0x7efd6ad9f790>),
('XLearner', <econml.metalearners._metalearners.XLearner at 0x7efd6ad9c6d0>)]