FastOLS

In this notebook, we’ll walk through an example of generating synthetic data, estimating treatment effects (ATEs, GATEs, and CATEs) using FastOLS, and comparing to our ground truth.

FastOLS is particularly useful when efficiently estimating ATEs and GATEs is of primary interest and the treatment is exogenous or confounding takes on a particularly simple functional form.

FastOLS assumes linear treatment effects & heterogeneity. This is generally sufficient for estimation of ATEs and GATEs, but can perform poorly in CATE estimation & prediction when heterogeneity is complex & nonlinear. For high quality CATE estimation, we recommend leveraging CamlCATE.

Generate Synthetic Data

Here we’ll leverage the SyntheticDataGenerator class to generate a linear synthetic data generating process, with an exogenous binary treatment, a continuous & a binary outcome, and binary & continuous mediating covariates.

from caml.logging import configure_logging
import logging

configure_logging(level=logging.DEBUG)

[06/10/25 15:22:53] DEBUG    Logging configured with level: DEBUG                                     logging.py:70

from caml.extensions.synthetic_data import SyntheticDataGenerator

data_generator = SyntheticDataGenerator(
    n_obs=10_000,
    n_cont_outcomes=1,
    n_binary_outcomes=1,
    n_binary_treatments=1,
    n_cont_confounders=2,
    n_cont_modifiers=3,
    n_binary_modifiers=2,
    stddev_outcome_noise=1,
    stddev_treatment_noise=1,
    causal_model_functional_form="linear",
    seed=10,
)

[06/10/25 15:22:54] WARNING  SyntheticDataGenerator is experimental and may change in future         generics.py:44
                             versions.

We can print our simulated data via:

data_generator.df

	W1_continuous	W2_continuous	X1_continuous	X2_continuous	X3_continuous	X4_binary	X5_binary	T1_binary	Y1_continuous	Y2_binary
0	0.354380	-3.252276	2.715662	-3.578800	-3.148647	0	0	1	-11.319410	0
1	0.568499	2.484069	-6.402235	-2.611815	-0.311498	0	0	1	9.167825	1
2	0.162715	8.842902	1.288770	-3.788545	-1.780285	0	0	0	-21.010316	0
3	0.362944	-0.959538	1.080988	-3.542550	-5.419263	1	0	1	-2.194239	0
4	0.612101	1.417536	4.143630	-4.112453	-4.551051	0	0	1	-20.600668	1
...	...	...	...	...	...	...	...	...	...	...
9995	0.340436	0.241095	-6.524222	-3.188783	-2.313515	0	0	1	16.349832	1
9996	0.019523	1.338152	-2.555492	-3.643733	-5.517065	0	1	1	5.386195	1
9997	0.325401	1.258659	-3.340546	-4.255203	-2.272974	0	0	1	-1.037638	0
9998	0.586715	1.263264	-2.826709	-4.149383	-2.712932	0	0	1	-0.235958	1
9999	0.003002	6.723381	1.260782	-3.660600	-0.962271	0	0	0	-20.007912	1

10000 rows × 10 columns

To inspect our true data generating process, we can call data_generator.dgp. Furthermore, we will have our true CATEs and ATEs at our disposal via data_generator.cates & data_generator.ates, respectively. We’ll use this as our source of truth for performance evaluation of our CATE estimator.

for t, df in data_generator.dgp.items():
    print(f"\nDGP for {t}:")
    print(df)


DGP for T1_binary:
{'formula': '1 + W1_continuous + W2_continuous', 'params': array([0.00323235, 0.14809798, 0.00393589]), 'noise': array([ 0.41708078,  2.72097405, -0.05882655, ...,  0.53984804,
        1.50464572, -0.08231107]), 'raw_scores': array([0.6130131 , 0.9436502 , 0.50082704, ..., 0.6447923 , 0.83198225,
       0.48696005]), 'function': <function SyntheticDataGenerator._create_dgp_function.<locals>.f_binary at 0x7f88407cf2e0>}

DGP for Y1_continuous:
{'formula': '1 + W1_continuous + W2_continuous + X1_continuous + X2_continuous + X3_continuous + X4_binary + X5_binary + T1_binary + T1_binary*X1_continuous + T1_binary*X2_continuous + T1_binary*X3_continuous + T1_binary*X4_binary + T1_binary*X5_binary', 'params': array([-0.35986539, -3.30656481, -1.08522702, -2.03156347,  4.15748518,
       -3.7649075 ,  2.0412171 , -4.06259065,  1.19724956, -1.49108621,
       -0.33216768,  1.0519125 , -0.68252501,  0.57505328]), 'noise': array([ 0.19963494, -0.500903  ,  1.15056187, ..., -1.08977141,
        0.79262583,  1.81566293]), 'raw_scores': array([-11.31940995,   9.16782505, -21.01031565, ...,  -1.03763754,
        -0.23595847, -20.00791171]), 'function': <function SyntheticDataGenerator._create_dgp_function.<locals>.f_cont at 0x7f88407cf7f0>}

DGP for Y2_binary:
{'formula': '1 + W1_continuous + W2_continuous + X1_continuous + X2_continuous + X3_continuous + X4_binary + X5_binary + T1_binary + T1_binary*X1_continuous + T1_binary*X2_continuous + T1_binary*X3_continuous + T1_binary*X4_binary + T1_binary*X5_binary', 'params': array([-0.69301323,  0.54152509,  0.39837891,  0.15335688,  0.39980853,
       -0.31388112, -0.18490127,  0.25395436,  0.19802621, -0.84703988,
       -0.12066193,  0.09225998,  0.37205785, -0.20954318]), 'noise': array([ 1.07352996,  0.43383419, -1.01413765, ..., -1.15478882,
        0.58572912,  0.22523345]), 'raw_scores': array([0.06237408, 0.99      , 0.75869455, ..., 0.65949716, 0.91015409,
       0.77628529]), 'function': <function SyntheticDataGenerator._create_dgp_function.<locals>.f_binary at 0x7f88407e5d80>}

data_generator.cates

	CATE_of_T1_binary_on_Y1_continuous	CATE_of_T1_binary_on_Y2_binary
0	-4.975376	-0.116864
1	11.283425	0.779704
2	-1.338690	-0.070368
3	-5.620992	-0.088495
4	-8.402543	-0.582547
...	...	...
9995	9.551022	0.857542
9996	0.989622	0.391185
9997	5.200762	0.679941
9998	3.936641	0.602064
9999	-0.478977	-0.111850

10000 rows × 2 columns

data_generator.ates

	Treatment	ATE
0	T1_binary_on_Y1_continuous	1.195965
1	T1_binary_on_Y2_binary	0.152762

Running FastOLS

Class Instantiation

We can instantiate and observe our FastOLS object via:

from caml import FastOLS

fo_obj = FastOLS(
    Y=[c for c in data_generator.df.columns if "Y" in c],
    T="T1_binary",
    G=[
        c
        for c in data_generator.df.columns
        if "X" in c and ("bin" in c or "dis" in c)
    ],
    X=[c for c in data_generator.df.columns if "X" in c and "cont" in c],
    W=[c for c in data_generator.df.columns if "W" in c],
    xformula="+ W1_continuous**2",
    engine="cpu",
    discrete_treatment=True,
)

                    WARNING  FastOLS is experimental and may change in future versions.              generics.py:44

                    DEBUG    Initializing FastOLS with parameters: Y=['Y1_continuous', 'Y2_binary'],     ols.py:146
                             T=T1_binary, G=['X4_binary', 'X5_binary'], X=['X1_continuous',                        
                             'X2_continuous', 'X3_continuous'], W=['W1_continuous', 'W2_continuous'],              
                             discrete_treatment=True, engine=cpu

                    DEBUG    Created formula: Q('Y1_continuous') + Q('Y2_binary') ~ C(Q('T1_binary')) +  ols.py:177
                             C(Q('X4_binary'))*C(Q('T1_binary')) + C(Q('X5_binary'))*C(Q('T1_binary')) +           
                             Q('X1_continuous')*C(Q('T1_binary')) + Q('X2_continuous')*C(Q('T1_binary'))           
                             + Q('X3_continuous')*C(Q('T1_binary')) + Q('W1_continuous') +                         
                             Q('W2_continuous') + W1_continuous**2

print(fo_obj)

================== FastOLS Object ==================
Engine: cpu
Outcome Variable: ['Y1_continuous', 'Y2_binary']
Treatment Variable: T1_binary
Discrete Treatment: True
Group Variables: ['X4_binary', 'X5_binary']
Features/Confounders for Heterogeneity (X): ['X1_continuous', 'X2_continuous', 'X3_continuous']
Features/Confounders as Controls (W): ['W1_continuous', 'W2_continuous']
Formula: Q('Y1_continuous') + Q('Y2_binary') ~ C(Q('T1_binary')) + C(Q('X4_binary'))*C(Q('T1_binary')) + C(Q('X5_binary'))*C(Q('T1_binary')) + Q('X1_continuous')*C(Q('T1_binary')) + Q('X2_continuous')*C(Q('T1_binary')) + Q('X3_continuous')*C(Q('T1_binary')) + Q('W1_continuous') + Q('W2_continuous') + W1_continuous**2

Fitting OLS model

We can now leverage the fit method to estimate the model outlined by fo_obj.formula. To capitalize on efficiency gains and parallelization in the estimation of GATEs, we will pass estimate_effects=True. The n_jobs argument will control the number of parallel jobs (GATE estimations) executed at a time. We will set n_jobs=-1 to use all available cores for parallelization.

Warning

When dealing with large datasets, setting n_jobs to a more conservative value can help prevent OOM errors.

For heteroskedasticity-robust variance estimation, we will also pass robust_vcv=True.

fo_obj.fit(
    data_generator.df, n_jobs=-1, estimate_effects=True, cov_type="HC1"
)

                    DEBUG    Creating model matrix...                                                    ols.py:546

                    DEBUG    Design Matrix Creation completed in 0.06 seconds                        generics.py:73

                    INFO     Fitting regression model...                                                 ols.py:508

                    DEBUG    Model Fitting completed in 0.02 seconds                                 generics.py:73

                    DEBUG    Creating treatment difference matrix...                                     ols.py:560

                    DEBUG    Design Matrix Creation completed in 0.12 seconds                        generics.py:73

                    INFO     Estimating Average Treatment Effects (ATEs)...                              ols.py:296

                    DEBUG    ATE Estimation completed in 0.00 seconds                                generics.py:73

                    INFO     Estimating Group Average Treatment Effects (GATEs)...                       ols.py:652

                    DEBUG    Starting parallel processing with -1 jobs                                   ols.py:680

                    DEBUG    Prespecified GATE Estimation completed in 0.02 seconds                  generics.py:73

We can now inspect the model fitted results and estimated treatment effects:

fo_obj.params
# fo_obj.vcv
# fo_obj.std_err
# fo_obj.fitted_values
# fo_obj.residuals

array([[-0.4519221 ,  0.43114593],
       [ 1.38073585, -0.0744549 ],
       [ 2.00450453, -0.01259566],
       [-4.04210602,  0.02437893],
       [-0.65283268,  0.04810221],
       [ 0.49183752, -0.01360602],
       [-2.02552124,  0.02571064],
       [-1.49027655, -0.10135024],
       [ 4.14289136,  0.06712511],
       [-0.28916188, -0.05434839],
       [-3.76790351, -0.05556488],
       [ 1.05912299,  0.03298887],
       [-1.63464173,  0.03581225],
       [-1.07407314,  0.04467536],
       [-1.63464173,  0.03581225]])

fo_obj.treatment_effects.keys()

dict_keys(['overall', 'X4_binary-0', 'X4_binary-1', 'X5_binary-0', 'X5_binary-1'])

fo_obj.treatment_effects["overall"]

{'outcome': ['Y1_continuous', 'Y2_binary'],
 'ate': array([[1.19490455, 0.13807782]]),
 'std_err': array([[0.01986595, 0.00783239]]),
 't_stat': array([[60.14836561, 17.62907058]]),
 'pval': array([[0., 0.]]),
 'n': 10000,
 'n_treated': 5172,
 'n_control': 4828}

Here we have direct access to the model parameters (fo_obj.params), variance-covariance matrices (fo_obj.vcv]), standard_errors (fo_obj.std_err), and estimated treatment effects (fo_obj.treatment_effects).

To make the treatment effect results more readable, we can leverage the prettify_treatment_effects method:

fo_obj.prettify_treatment_effects()

	group	membership	outcome	ate	std_err	t_stat	pval	n	n_treated	n_control
0	overall	None	Y1_continuous	1.194905	0.019866	60.148366	0.000000e+00	10000	5172	4828
1	overall	None	Y2_binary	0.138078	0.007832	17.629071	0.000000e+00	10000	5172	4828
2	X4_binary	0	Y1_continuous	1.336099	0.021504	62.133499	0.000000e+00	8536	4374	4162
3	X4_binary	0	Y2_binary	0.134263	0.008443	15.902456	0.000000e+00	8536	4374	4162
4	X4_binary	1	Y1_continuous	0.371658	0.051932	7.156565	8.273382e-13	1464	798	666
5	X4_binary	1	Y2_binary	0.160322	0.021007	7.631865	2.309264e-14	1464	798	666
6	X5_binary	0	Y1_continuous	1.149854	0.021556	53.342794	0.000000e+00	8564	4430	4134
7	X5_binary	0	Y2_binary	0.142165	0.008438	16.848194	0.000000e+00	8564	4430	4134
8	X5_binary	1	Y1_continuous	1.463574	0.051157	28.609334	0.000000e+00	1436	742	694
9	X5_binary	1	Y2_binary	0.113703	0.020950	5.427234	5.723416e-08	1436	742	694

Comparing our overall treatment effect (ATE) to the ground truth, we have:

data_generator.ates

	Treatment	ATE
0	T1_binary_on_Y1_continuous	1.195965
1	T1_binary_on_Y2_binary	0.152762

We can also see what our GATEs are using data_generator.cates. Let’s choose X4_binary in 1 group:

data_generator.cates.iloc[
    data_generator.df.query("X4_binary == 1").index
].mean()

CATE_of_T1_binary_on_Y1_continuous    0.347413
CATE_of_T1_binary_on_Y2_binary        0.177773
dtype: float64

Custom Group Average Treatment Effects (GATEs)

Let’s now look at how we can estimate any arbitary GATE using estimate_ate method and prettify the results with prettify_treatment_effects.

custom_gate_df = data_generator.df.query(
    "X4_binary == 1 & X2_continuous < -3"
).copy()

custom_gate = fo_obj.estimate_ate(
    custom_gate_df,
    group="My Custom Group",
    membership="My Custom Membership",
    return_results_dict=True,
)
fo_obj.prettify_treatment_effects(effects=custom_gate)

                    INFO     Estimating Average Treatment Effects (ATEs)...                              ols.py:296

                    DEBUG    Creating treatment difference matrix...                                     ols.py:560

                    DEBUG    Design Matrix Creation completed in 0.03 seconds                        generics.py:73

                    DEBUG    ATE Estimation completed in 0.03 seconds                                generics.py:73

	group	membership	outcome	ate	std_err	t_stat	pval	n	n_treated	n_control
0	My Custom Group	My Custom Membership	Y1_continuous	0.541877	0.052269	10.367002	0.0	1197	661	536
1	My Custom Group	My Custom Membership	Y2_binary	0.184703	0.021119	8.746004	0.0	1197	661	536

Let’s compare this to the ground truth as well:

data_generator.cates.iloc[custom_gate_df.index].mean()

CATE_of_T1_binary_on_Y1_continuous    0.530934
CATE_of_T1_binary_on_Y2_binary        0.198813
dtype: float64

Conditional Average Treatment Effects (CATEs)

Let’s now look at how we can estimate CATEs / approximate individual-level treatment effects via estimate_cate method

Note

The predict method is a simple alias for estimate_cate. Either can be used, but namespacing was created to higlight that estimate_cate / predict can be used for out of sample treatment effect prediction.

cates = fo_obj.estimate_cate(data_generator.df)

cates

                    INFO     Estimating Conditional Average Treatment Effects (CATEs)...                 ols.py:365

                    DEBUG    Creating treatment difference matrix...                                     ols.py:560

                    DEBUG    Design Matrix Creation completed in 0.09 seconds                        generics.py:73

                    DEBUG    CATE Estimation completed in 0.09 seconds                               generics.py:73

array([[-4.96630384, -0.25905625],
       [11.3471586 ,  0.70608504],
       [-1.32992654, -0.05790035],
       ...,
       [ 5.18215613,  0.42039076],
       [ 3.91982932,  0.34804847],
       [-0.45883511, -0.03503195]])

If we wanted additional information on CATEs (such as standard errors), we can call:

fo_obj.estimate_cate(data_generator.df, return_results_dict=True)

                    INFO     Estimating Conditional Average Treatment Effects (CATEs)...                 ols.py:365

                    DEBUG    Creating treatment difference matrix...                                     ols.py:560

                    DEBUG    Design Matrix Creation completed in 0.12 seconds                        generics.py:73

                    DEBUG    CATE Estimation completed in 0.14 seconds                               generics.py:73

{'outcome': ['Y1_continuous', 'Y2_binary'],
 'cate': array([[-4.96630384, -0.25905625],
        [11.3471586 ,  0.70608504],
        [-1.32992654, -0.05790035],
        ...,
        [ 5.18215613,  0.42039076],
        [ 3.91982932,  0.34804847],
        [-0.45883511, -0.03503195]]),
 'std_err': array([[0.02874559, 0.01134792],
        [0.04431468, 0.01669346],
        [0.02526043, 0.00995281],
        ...,
        [0.02767557, 0.01025595],
        [0.02636869, 0.00997712],
        [0.02779769, 0.01090012]]),
 't_stat': array([[-172.76749964,  -22.82852289],
        [ 256.05869601,   42.29711604],
        [ -52.6486181 ,   -5.8174884 ],
        ...,
        [ 187.2465665 ,   40.98993518],
        [ 148.6546921 ,   34.88465475],
        [ -16.50623202,   -3.21390529]]),
 'pval': array([[0.00000000e+00, 0.00000000e+00],
        [0.00000000e+00, 0.00000000e+00],
        [0.00000000e+00, 5.97384053e-09],
        ...,
        [0.00000000e+00, 0.00000000e+00],
        [0.00000000e+00, 0.00000000e+00],
        [0.00000000e+00, 1.30942855e-03]])}

Now, let’s make our cate predictions:

cate_predictions = fo_obj.predict(data_generator.df)

## We can also make predictions of the outcomes, if desired.
# fo_obj.predict(data_generator.df, mode="outcome")

                    INFO     Estimating Conditional Average Treatment Effects (CATEs)...                 ols.py:365

                    DEBUG    Creating treatment difference matrix...                                     ols.py:560

[06/10/25 15:22:55] DEBUG    Design Matrix Creation completed in 0.12 seconds                        generics.py:73

                    DEBUG    CATE Estimation completed in 0.12 seconds                               generics.py:73

Let’s now look at the Precision in Estimating Heterogeneous Effects (PEHE) (e.g., RMSE) and plot some results for the treatment effects on each outcome:

Effect of binary T1 on continuous Y1

from sklearn.metrics import root_mean_squared_error
from caml.extensions.plots import (
    cate_true_vs_estimated_plot,
    cate_histogram_plot,
    cate_line_plot,
)

true_cates1 = data_generator.cates.iloc[:, 0]
predicted_cates1 = cate_predictions[:, 0]
root_mean_squared_error(true_cates1, predicted_cates1)

0.054197542640946

cate_true_vs_estimated_plot(
    true_cates=true_cates1, estimated_cates=predicted_cates1
)

cate_histogram_plot(true_cates=true_cates1, estimated_cates=predicted_cates1)

cate_line_plot(
    true_cates=true_cates1, estimated_cates=predicted_cates1, window=20
)

Effect of binary T1 on binary Y2

true_cates2 = data_generator.cates.iloc[:, 1]
predicted_cates2 = cate_predictions[:, 1]
root_mean_squared_error(true_cates2, predicted_cates2)

0.15683038625127854

cate_true_vs_estimated_plot(
    true_cates=true_cates2, estimated_cates=predicted_cates2
)

cate_histogram_plot(true_cates=true_cates2, estimated_cates=predicted_cates2)

cate_line_plot(
    true_cates=true_cates2, estimated_cates=predicted_cates2, window=20
)

Note

The CATE estimates for binary outcome using simulated data may perform poorly b/c of non-linear transformation (sigmoid) of linear logodds. In general, FastOLS should be prioritized when ATEs and GATEs are of primary interest. For high quality CATE estimation, we recommend leveraging CamlCATE.