FastOLS

FastOLS(
    self,
    Y,
    T,
    G=None,
    X=None,
    W=None,
    *,
    xformula=None,
    discrete_treatment=False,
    engine='cpu',
)

FastOLS is an optimized implementation of the OLS estimator designed specifically with treatment effect estimation in mind.

FastOLS is experimental and may change significantly in future versions.

This class estimates a standard linear regression model for any number of continuous or binary outcomes and a single continuous or binary treatment, and provides estimates for the Average Treatment Effects (ATEs) and Group Average Treatment Effects (GATEs) out of the box. Additionally, methods are provided for estimating custom GATEs & Conditional Average Treatment Effects (CATEs) of individual observations, which can also be used for out-of-sample predictions. Note, this method assumes linear treatment effects and heterogeneity, which is typically sufficient when primarily concerned with ATEs and GATEs.

This class leverages JAX for fast numerical computations, which can be installed using pip install caml[jax], defaulting to NumPy if JAX is not available. For GPU acceleration, install JAX with GPU support using pip install caml[jax-gpu].

For outcome/treatment support, see Support Matrix.

For model specification details, see Model Specifications.

For a more detailed working example, see FastOLS Example.

Parameters

Name	Type	Description	Default
Y	Collection[str]	A list of outcome variable names.	required
T	str	The treatment variable name.	required
G	Collection[str] \| None	A list of group variable names. These will be the groups for which GATEs will be estimated.	`None`
X	Collection[str] \| None	A list of covariate variable names. These will be the covariates for which heterogeneity/CATEs can be estimated.	`None`
W	Collection[str] \| None	A list of additional covariate variable names to be used as controls. These will be the additional covariates not used for modeling heterogeneity/CATEs.	`None`
xformula	str \| None	Additional formula string to append to the main formula, starting with “+”. For example, “+age+gender” will add age and gender as additional predictors.	`None`
discrete_treatment	bool	Whether the treatment is discrete	`False`
engine	str	The engine to use for computation. Can be “cpu” or “gpu”. Note “gpu” requires JAX to be installed, which can be installed via `pip install caml[jax-gpu]`.	`'cpu'`

Attributes

Name	Type	Description
Y	Collection[str]	A list of outcome variable names.
T	str	The treatment variable name.
G	Collection[str] \| None	The list of group variable names. These will be the groups for which GATEs will be estimated.
X	Collection[str] \| None	The list of variable names representing the confounder/control feature set to be utilized for estimating heterogeneity/CATEs, that are in addition to G.
W	Collection[str] \| None	The list of variable names representing the confounder/control feature not utilized for estimating heterogeneity/CATEs.
formula	str	The formula leveraged for design matrix creation via Patsy.
params	np.ndarray	The estimated parameters of the model.
vcv	np.ndarray	The estimated variance-covariance matrix of the model parameters.
std_err	np.ndarray	The standard errors of the estimated parameters.
fitted_values	np.ndarray	The predicted values from the model.
residuals	np.ndarray	The residuals of the model.
treatment_effects	dict	The estimated treatment effects dictionary.

Examples

from caml import FastOLS
from caml.extensions.synthetic_data import SyntheticDataGenerator

data_generator = SyntheticDataGenerator(n_cont_outcomes=1,
                                            n_binary_outcomes=1,
                                            n_cont_modifiers=1,
                                            n_binary_modifiers=2,
                                            seed=10)
df = data_generator.df

fo_obj = FastOLS(
    Y=[c for c in df.columns if "Y" in c],
    T="T1_binary",
    G=[c for c in df.columns if "X" in c and ("bin" in c or "dis" in c)],
    X=[c for c in df.columns if "X" in c and "cont" in c],
    W=[c for c in df.columns if "W" in c],
    xformula=None,
    engine="cpu",
    discrete_treatment=True,
)

print(fo_obj)

================== FastOLS Object ==================
Engine: cpu
Outcome Variable: ['Y1_continuous', 'Y2_binary']
Treatment Variable: T1_binary
Discrete Treatment: True
Group Variables: ['X2_binary', 'X3_binary']
Features/Confounders for Heterogeneity (X): ['X1_continuous']
Features/Confounders as Controls (W): []
Formula: Q('Y1_continuous') + Q('Y2_binary') ~ C(Q('T1_binary')) + C(Q('X2_binary'))*C(Q('T1_binary')) + C(Q('X3_binary'))*C(Q('T1_binary')) + Q('X1_continuous')*C(Q('T1_binary'))

Methods

Name	Description
fit	Fits the regression model on the provided data and, optionally, estimates Average Treatment Effect(s) (ATE) and Group Average Treatment Effect(s) (GATE).
estimate_ate	Estimate Average Treatment Effects (ATEs) of `T` on each `Y` from fitted model.
estimate_cate	Estimate Conditional Average Treatment Effects (CATEs) of `T` on each `Y` from fitted model for all given observations in the dataset.
predict	Generate predicted conditional average treatment effects (CATEs) or outcomes.
prettify_treatment_effects	Convert treatment effects dictionary to a pandas DataFrame.

fit

FastOLS.fit(df, *, n_jobs=-1, estimate_effects=True, cov_type='nonrobust')

Fits the regression model on the provided data and, optionally, estimates Average Treatment Effect(s) (ATE) and Group Average Treatment Effect(s) (GATE).

If estimate_effects is True, the method estimates Average Treatment Effects (ATEs) and Group Average Treatment Effects (GATEs), based on specified G. This leverages estimate_ate method under the hood, but efficiently reuses the data and parallelizes the computation of GATEs.

Parameters

Name	Type	Description	Default
df	PandasConvertibleDataFrame	Input dataframe to fit the model on. Supported formats: pandas DataFrame, PySpark DataFrame, Polars DataFrame, or Any object with `toPandas()` or `to_pandas()` method	required
n_jobs	int	The number of jobs to use for parallel processing in the estimation of GATEs. Defaults to -1, which uses all available processors. If getting OOM errors, try setting n_jobs to a lower value.	`-1`
estimate_effects	bool	Whether to estimate Average Treatment Effects (ATEs) and Group Average Treatment Effects (GATEs).	`True`
cov_type	str	The covariance estimator to use for variance-covariance matrix and standard errors. Can be “nonrobust”, “HC0”, or “HC1”.	`'nonrobust'`

Examples

fo_obj.fit(df, n_jobs=4, estimate_effects=True, cov_type='nonrobust')

fo_obj.treatment_effects.keys()

dict_keys(['overall', 'X2_binary-0', 'X2_binary-1', 'X3_binary-1', 'X3_binary-0'])

estimate_ate

FastOLS.estimate_ate(
    df,
    *,
    return_results_dict=False,
    group='Custom Group',
    membership=None,
    _diff_matrix=None,
)

Estimate Average Treatment Effects (ATEs) of T on each Y from fitted model.

If the entire dataframe is provided, the function will estimate the ATE of the entire population, where the ATE, in the case of binary treatments, is formally defined as: \[ \tau = \mathbb{E}_n[\mathbf{Y}_1 - \mathbf{Y}_0] \]

If a subset of the dataframe is provided, the function will estimate the ATE of the subset (e.g., GATEs), where the GATE, in the case of binary treatments, is formally defined as: \[ \tau = \mathbb{E}_n[\mathbf{Y}_1 - \mathbf{Y}_0|\mathbf{G}=G] \]

For more details on treatment effect estimation, see Model Specifications.

Parameters

Name	Type	Description	Default
df	PandasConvertibleDataFrame	Dataframe containing the data to estimate the ATEs. Supported formats: pandas DataFrame, PySpark DataFrame, Polars DataFrame, or Any object with `toPandas()` or `to_pandas()` method	required
return_results_dict	bool	If True, the function returns a dictionary containing ATEs/GATEs, standard errors, t-statistics, and p-values. If False, the function returns a numpy array containing ATEs/GATEs alone.	`False`
group	str	Name of the group to estimate the ATEs for.	`'Custom Group'`
membership	str \| None	Name of the membership variable to estimate the ATEs for.	`None`
_diff_matrix	jnp.ndarray \| None = None	Private argument used in `fit` method.	`None`

Returns

Name	Type	Description
	jnp.ndarray \| dict	Estimated ATEs/GATEs or dictionary containing the estimated ATEs/GATEs and their standard errors, t-statistics, and p-values.

Examples

ate = fo_obj.estimate_ate(df, return_results_dict=True, group="Overall")

ate

{'Overall': {'outcome': ['Y1_continuous', 'Y2_binary'],
  'ate': array([[3.75944943, 0.19349086]]),
  'std_err': array([[0.02012718, 0.00971994]]),
  't_stat': array([[186.78472875,  19.90658227]]),
  'pval': array([[0., 0.]]),
  'n': 10000,
  'n_treated': 5060,
  'n_control': 4940}}

df_filtered = df.query(
    "X3_binary == 0 & X1_continuous < 5"
).copy()

custom_gate = fo_obj.estimate_ate(df_filtered)

custom_gate

array([[1.39607152, 0.08336419]])

estimate_cate

FastOLS.estimate_cate(df, *, return_results_dict=False)

Estimate Conditional Average Treatment Effects (CATEs) of T on each Y from fitted model for all given observations in the dataset.

The CATE, in the case of binary treatments, is formally defined as: \[ \tau = \mathbb{E}_n[\mathbf{Y}_1 - \mathbf{Y}_0|\mathbf{Q}=Q] \]

For more details on treatment effect estimation, see Model Specifications.

Parameters

Name	Type	Description	Default
df	PandasConvertibleDataFrame	Dataframe containing the data to estimate CATEs for. Supported formats: pandas DataFrame, PySpark DataFrame, Polars DataFrame, or Any object with `toPandas()` or `to_pandas()` method	required
return_results_dict	bool	If True, the function returns a dictionary containing CATEs, standard errors, t-statistics, and p-values. If False, the function returns a numpy array containing CATEs alone.	`False`

Returns

Name	Type	Description
	jnp.ndarray \| dict	CATEs or dictionary containing CATEs, standard errors, t-statistics, and p-values.

Examples

cates = fo_obj.estimate_cate(df)
cates[:5]

array([[3.89639511, 0.17159235],
       [3.74635426, 0.23789582],
       [4.5283798 , 0.21765926],
       [3.87988946, 0.17201947],
       [3.66232416, 0.24007028]])

res = fo_obj.estimate_cate(df, return_results_dict=True)
res.keys()

dict_keys(['outcome', 'cate', 'std_err', 't_stat', 'pval'])

predict

FastOLS.predict(df, *, return_results_dict=False, mode='cate')

Generate predicted conditional average treatment effects (CATEs) or outcomes.

When mode is “outcome”, the function returns predicted outcomes.

When mode is “cate”, the function returns predicted CATEs, behaving as an alias for estimate_cate.

Parameters

Name	Type	Description	Default
df	PandasConvertibleDataFrame	Dataframe containing the data to estimate CATEs for. Supported formats: pandas DataFrame, PySpark DataFrame, Polars DataFrame, or Any object with `toPandas()` or `to_pandas()` method	required
return_results_dict	bool	If True, the function returns a dictionary containing CATEs, standard errors, t-statistics, and p-values. If False, the function returns a numpy array containing CATEs alone. Does not have any effect when mode is “outcome”.	`False`
mode	str	The mode of prediction. Supported modes are “cate” and “outcome”. If “cate”, the function returns CATEs. If “outcome”, the function returns predicted outcomes.	`'cate'`

Returns

Name	Type	Description
	jnp.ndarray \| dict	CATEs or dictionary containing CATEs, standard errors, t-statistics, and p-values.

Examples

cates = fo_obj.predict(df)
cates[:5]

array([[3.89639511, 0.17159235],
       [3.74635426, 0.23789582],
       [4.5283798 , 0.21765926],
       [3.87988946, 0.17201947],
       [3.66232416, 0.24007028]])

res = fo_obj.predict(df, return_results_dict=True)
res.keys()

dict_keys(['outcome', 'cate', 'std_err', 't_stat', 'pval'])

prettify_treatment_effects

FastOLS.prettify_treatment_effects(effects=None)

Convert treatment effects dictionary to a pandas DataFrame.

If no argument is provided, the results are constructed from internal results dictionary. This is useful default behavior. For custom treatment effects, you can pass the results generated by the estimate_ate method.

Parameters

Name	Type	Description	Default
effects	dict	Dictionary of treatment effects. If None, the results are constructed from internal results dictionary.	`None`

Returns

Name	Type	Description
	pd.DataFrame	DataFrame of treatment effects.

Examples

fo_obj.prettify_treatment_effects()

	group	membership	outcome	ate	std_err	t_stat	pval	n	n_treated	n_control
0	overall	None	Y1_continuous	3.759449	0.020127	186.784729	0.000000	10000	5060	4940
1	overall	None	Y2_binary	0.193491	0.009720	19.906582	0.000000	10000	5060	4940
2	X2_binary	0	Y1_continuous	3.584580	0.034935	102.606668	0.000000	3320	1691	1629
3	X2_binary	0	Y2_binary	0.156381	0.016871	9.269187	0.000000	3320	1691	1629
4	X2_binary	1	Y1_continuous	3.846360	0.024626	156.193627	0.000000	6680	3369	3311
5	X2_binary	1	Y2_binary	0.211934	0.011892	17.821080	0.000000	6680	3369	3311
6	X3_binary	1	Y1_continuous	4.081423	0.021454	190.238254	0.000000	8801	4451	4350
7	X3_binary	1	Y2_binary	0.208494	0.010361	20.123279	0.000000	8801	4451	4350
8	X3_binary	0	Y1_continuous	1.396072	0.058130	24.016441	0.000000	1199	609	590
9	X3_binary	0	Y2_binary	0.083364	0.028072	2.969612	0.002982	1199	609	590

## Using a custom GATE
custom_gate = fo_obj.estimate_ate(df_filtered, return_results_dict=True, group="My Custom Group")
fo_obj.prettify_treatment_effects(custom_gate)

	group	membership	outcome	ate	std_err	t_stat	pval	n	n_treated	n_control
0	My Custom Group	None	Y1_continuous	1.396072	0.058130	24.016441	0.000000	1199	609	590
1	My Custom Group	None	Y2_binary	0.083364	0.028072	2.969612	0.002982	1199	609	590