Nonparametric Instrumental Variables

This module implements Debiased Machine Learning for Nonparametric Instrumental Variables (DML-npiv). It provides tools for estimating causal effects using a combination of machine learning models and instrumental variables techniques. The module supports cross-validation, kernel density estimation for localization, and confidence interval computation with pointwise or uniform guarantees.

Classes:

DML_npiv: Main class for performing DML-npiv with various configuration options.

DML_npiv Methods:

__init__: Initialize the DML_npiv instance with data and model configurations.

_calculate_confidence_interval: Calculate confidence intervals for the estimates.

_localization: Perform localization using kernel density estimation.

_npivfit_outcome: Fit the outcome model using nonparametric instrumental variables.

_propensity_score: Estimate the propensity score.

_npivfit_action: Fit the action model using nonparametric instrumental variables.

_process_fold: Process a single fold for cross-validation.

_split_and_estimate: Split the data and estimate the model using cross-validation.

dml: Perform Debiased Machine Learning for Nonparametric Instrumental Variables.

class dml_npiv.DML_npiv(Y, D, Z, W, X1=None, V=None, v_values=None, include_V=True, ci_type='pointwise', loc_kernel='gau', bw_loc='silverman', estimator='MR', model1=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_1=False, modelq1=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_q1=False, alpha=0.05, n_folds=5, n_rep=1, random_seed=123, prop_score=sklearn.linear_model.LogisticRegression, CHIM=False, verbose=True, fitargs1=None, fitargsq1=None, opts=None)[source]

Bases: object

Debiased Machine Learning for Nonparametric Instrumental Variables (DML-npiv) class.

Parameters

Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
Z (array-like) – Instrumental variable.
W (array-like) – Negative control outcome.
X1 (array-like, optional) – Additional covariates.
V (array-like, optional) – Localization covariates.
v_values (array-like, optional) – Values for localization.
include_V (bool, optional) – Include localization covariates in the model.
ci_type (str, optional) – Type of confidence interval (‘pointwise’, ‘uniform’).
loc_kernel (str, optional) – Kernel for localization. Options include ‘gau’, ‘epa’, ‘uni’, ‘tri’, etc.
bw_loc (str, optional) – Bandwidth for localization.
estimator (str, optional) – Estimator type (‘MR’, ‘OR’, ‘IPW’).
model1 (estimator, optional) – Model for the first stage.
nn_1 (bool, optional) – Use neural network for the first stage.
modelq1 (estimator, optional) – Model for the second stage.
nn_q1 (bool, optional) – Use neural network for the second stage.
alpha (float, optional) – Significance level for confidence intervals.
n_folds (int, optional) – Number of folds for estimation.
n_rep (int, optional) – Number of repetitions for estimation.
random_seed (int, optional) – Seed for random number generator.
prop_score (estimator, optional) – Model for propensity score.
CHIM (bool, optional) – Use CHIM method. Dropping observations with extreme values of the propensity score - CHIM (2009).
verbose (bool, optional) – Print progress information.
fitargs1 (dict, optional) – Arguments for fitting the first stage model.
fitargsq1 (dict, optional) – Arguments for fitting the second stage model.
opts (dict, optional) – Additional options.

_calculate_confidence_interval(theta, theta_var, theta_cov)[source]

Calculate the confidence interval for the given estimates.

Parameters

theta (array-like) – Estimated values.
theta_var (array-like) – Variance of the estimates.
theta_cov (array-like) – Covariance matrix of the estimates.

Returns

Lower and upper bounds of the confidence intervals.

Return type

array-like

_localization(V, v_val, bw)[source]

Perform localization using kernel density estimation.

Parameters

V (array-like) – Localization covariates.
v_val (array-like) – Values for localization.
bw (float) – Bandwidth for localization.

Returns

Weights for localization.

Return type

array-like

_npivfit_action(ps_hat_1, W, X, Z, alfa=0.0)[source]

Fit the action model using nonparametric instrumental variables.

Parameters

ps_hat_1 (array-like) – Estimated propensity scores.
W (array-like) – Control variable.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.
alfa (float, optional) – Threshold alpha for propensity scores.

Returns

Fitted models for treated and control groups.

Return type

tuple

_npivfit_outcome(Y, D, X, Z)[source]

Fit the outcome model using nonparametric instrumental variables.

Parameters

Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.

Returns

Fitted models for treatment and control groups.

Return type

tuple

_process_fold(fold_idx, train_data, test_data)[source]

Process a single fold for cross-validation.

Parameters

fold_idx (int) – Fold index.
train_data (tuple) – Training data for the fold.
test_data (tuple) – Testing data for the fold.

Returns

Estimated moment functions for the test data.

Return type

array-like

_propensity_score(X, W, D)[source]

Estimate the propensity score.

Parameters

X (array-like) – Covariates.
W (array-like) – Control variable.
D (array-like) – Treatment variable.

Returns

Estimated propensity scores and threshold alpha.

Return type

tuple

_split_and_estimate()[source]

Split the data and estimate the model for each fold.

Returns: Estimated values, variances, and confidence intervals.
Return type: tuple

dml()[source]

Perform Debiased Machine Learning for Nonparametric Instrumental Variables.

Returns: Estimated values, variances, and confidence intervals.
Return type: tuple

dml_npiv._fun_threshold_alpha(alpha, g)[source]

Auxiliary function for computation of optimal alpha for improvement in overlap: CHIM (Dealing with limited overlap in estimation of average treatment effects).

Richard K. Crump, V. Joseph Hotz, Guido W. Imbens, Oscar A. Mitnik Biometrika, Volume 96, Issue 1, March 2009.

Parameters

alpha (float) – Alpha value.
g (array-like) – Input array.

Returns

Result of the threshold function.

Return type

float

dml_npiv._get(opts, key, default)[source]

Retrieve the value associated with ‘key’ in ‘opts’, or return ‘default’ if not present.

Parameters

opts (dict) – Dictionary of options.
key (str) – Key to look up in ‘opts’.
default (any) – Default value to return if ‘key’ is not found.

Returns

Value associated with ‘key’ or ‘default’.

Return type

any

dml_npiv._transform_poly(X, opts)[source]

Transform the input data X using polynomial features.

Parameters

X (array-like) – Input data.
opts (dict) – Options dictionary containing the polynomial degree (‘lin_degree’).

Returns

Transformed data.

Return type

array-like

Mediation Analysis

Joint/Sequential mediation

This module performs Debiased Machine Learning for mediation analysis, using joint or sequential estimation for longitudinal nonparametric parameters (in the Nested NPIV framework). It provides tools for estimating causal effects with mediation using a combination of machine learning models and instrumental variables techniques. The module supports different types of mediated estimands, cross-validation, kernel density estimation for localization, and confidence interval computation with pointwise or uniform guarantees.

Classes:

DML_mediated: Main class for performing DML for mediation analysis with joint/sequential model fitting.

DML_mediated Methods:

__init__: Initialize the DML_mediated instance with data and model configurations.

_calculate_confidence_interval: Calculate confidence intervals for the estimates.

_localization: Perform localization using kernel density estimation.

_npivfit_outcome: Fit the outcome model using nonparametric instrumental variables.

_nnpivfit_outcome_m: Fit the mediated outcome model sequentially using nonparametric instrumental variables.

_propensity_score: Estimate the propensity score.

_npivfit_action: Fit the action model using nonparametric instrumental variables.

_nnpivfit_action_m: Fit the mediated action model sequentially using nonparametric instrumental variables.

_scores_mediated: Calculate the scores for the mediated effects.

_scores_Y1: Calculate the scores for the Y1 estimand.

_process_fold: Process a single fold for cross-validation.

_split_and_estimate: Split the data and estimate the model for each fold.

dml: Perform Debiased Machine Learning for Nonparametric Instrumental Variables.

class dml_mediated.DML_mediated(Y, D, M, W, Z, X1=None, V=None, v_values=None, include_V=True, ci_type='pointwise', loc_kernel='gau', bw_loc='silverman', estimator='MR', estimand='ATE', model1=<nnpiv.rkhs.rkhs2iv.RKHS2IVL2 object>, nn_1=False, modelq1=<nnpiv.rkhs.rkhs2iv.RKHS2IVL2 object>, nn_q1=False, model_y=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_y=False, model_a=<nnpiv.rkhs.rkhsiv.ApproxRKHSIVCV object>, nn_a=False, alpha=0.05, n_folds=5, n_rep=1, random_seed=123, prop_score=sklearn.linear_model.LogisticRegression, CHIM=False, verbose=True, fitargs1=None, fitargsq1=None, fitargsy=None, fitargsa=None, opts=None)[source]

Bases: object

Debiased Machine Learning for mediation analysis (DML-mediation) class with joint/sequential model fitting.

Parameters

Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
M (array-like) – Mediator variable.
W (array-like) – Negative control outcome.
Z (array-like) – Instrumental variable.
X1 (array-like, optional) – Additional covariates.
V (array-like, optional) – Localization covariates.
v_values (array-like, optional) – Values for localization.
include_V (bool, optional) – Include localization covariates in the model.
ci_type (str, optional) – Type of confidence interval (‘pointwise’, ‘uniform’).
loc_kernel (str, optional) – Kernel for localization. Options include ‘gau’, ‘epa’, ‘uni’, ‘tri’, etc.
bw_loc (str, optional) – Bandwidth for localization.
estimator (str, optional) – Estimator type (‘MR’, ‘OR’, ‘hybrid’, ‘IPW’).
estimand (str, optional) – Type of estimand (‘ATE’, ‘Indirect’, ‘Direct’, ‘E[Y1]’, ‘E[Y0]’, ‘E[Y(1,M(0))]’).
model1 (estimator /(list), optional) – Model for the outcome stage - Can be a joint or sequential estimator; if the latter a list must be given
nn_1 (bool /(list), optional) – Use neural network for the outcome stage.
modelq1 (estimator /(list), optional) – Model for the q1 stage - Can be a joint or sequential estimator; if the latter a list must be given
nn_q1 (bool /(list), optional) – Use neural network for the q1 stage.
model_y (estimator, optional) – Model for the outcome - for use with ‘E[Y1]’, ‘E[Y0]’, ‘Direct’, ‘Indirect’, and ‘ATE’ estimands.
nn_y (bool, optional) – Use neural network for the outcome model.
model_a (estimator, optional) – Model for the action - for use with ‘E[Y1]’, ‘E[Y0]’, ‘Direct’, ‘Indirect’, and ‘ATE’ estimands.
nn_a (bool, optional) – Use neural network for the action model.
alpha (float, optional) – Significance level for confidence intervals.
n_folds (int, optional) – Number of folds for estimation.
n_rep (int, optional) – Number of repetitions for estimation.
random_seed (int, optional) – Seed for random number generator.
prop_score (estimator, optional) – Model for propensity score.
CHIM (bool, optional) – Use CHIM method: Dropping observations with extreme values of the propensity score - CHIM (2009)
verbose (bool, optional) – Print progress information.
fitargs1 (dict, optional) – Arguments for fitting the outcome stage model.
fitargsq1 (dict, optional) – Arguments for fitting the q1 stage model.
fitargsy (dict, optional) – Arguments for fitting the one stage outcome model.
fitargsa (dict, optional) – Arguments for fitting the one stage action model.
opts (dict, optional) – Additional options.

_calculate_confidence_interval(theta, theta_var, theta_cov)[source]

Calculate the confidence interval for the given estimates.

Parameters

theta (array-like) – Estimated values.
theta_var (array-like) – Variance of the estimates.
theta_cov (array-like) – Covariance matrix of the estimates.

Returns

Lower and upper bounds of the confidence intervals.

Return type

array-like

_localization(V, v_val, bw)[source]

Perform localization using kernel density estimation.

Parameters

V (array-like) – Localization covariates.
v_val (array-like) – Values for localization.
bw (float) – Bandwidth for localization.

Returns

Weights for localization.

Return type

array-like

_nnpivfit_action_m(ps_hat_0, ps_hat_00, D, M, W, X, Z, alfa=0.0)[source]

Fit the mediated action model using nonparametric instrumental variables.

Parameters

ps_hat_0 (array-like) – Estimated propensity scores for control group.
ps_hat_00 (array-like) – Estimated propensity scores for mediated control group.
D (array-like) – Treatment variable.
M (array-like) – Mediator variable.
W (array-like) – Negative control outcome.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.
alfa (float, optional) – Threshold alpha for propensity scores.

Returns

Fitted models for mediated action.

Return type

tuple

_nnpivfit_outcome_m(Y, D, M, W, X, Z)[source]

Fit the mediated outcome model using nonparametric instrumental variables.

Parameters

Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
M (array-like) – Mediator variable.
W (array-like) – Negative control outcome.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.

Returns

Fitted models for treatment and control groups.

Return type

tuple

_npivfit_action(ps_hat_1, W, X, Z, alfa=0.0)[source]

Fit the action model using nonparametric instrumental variables.

Parameters

ps_hat_1 (array-like) – Estimated propensity scores.
W (array-like) – Negative control outcome.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.
alfa (float, optional) – Threshold alpha for propensity scores.

Returns

Fitted model for the action.

Return type

object

_npivfit_outcome(Y, D, X, Z)[source]

Fit the outcome model using nonparametric instrumental variables.

Parameters

Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
X (array-like) – Covariates.
Z (array-like) – Instrumental variable.

Returns

Fitted model.

Return type

object

_process_fold(fold_idx, train_data, test_data)[source]

Process a single fold for cross-validation.

Parameters

fold_idx (int) – Fold index.
train_data (tuple) – Training data for the fold.
test_data (tuple) – Testing data for the fold.

Returns

Estimated moment functions for the test data.

Return type

array-like

_propensity_score(M, X, W, D)[source]

Estimate the propensity score.

Parameters

M (array-like) – Mediator variable.
X (array-like) – Covariates.
W (array-like) – Negative control outcome.
D (array-like) – Treatment variable.

Returns

Estimated propensity scores and threshold alpha.

Return type

tuple

_scores_Y1(train_Y, train_D, train_M, train_W, train_X, train_Z, test_Y, test_D, test_X, test_Z)[source]

Calculate the scores for the Y1 estimand.

Parameters

train_Y (array-like) – Training outcome variable.
train_D (array-like) – Training treatment variable.
train_M (array-like) – Training mediator variable.
train_W (array-like) – Training negative control outcome.
train_X (array-like) – Training covariates.
train_Z (array-like) – Training instrumental variable.
test_Y (array-like) – Testing outcome variable.
test_D (array-like) – Testing treatment variable.
test_X (array-like) – Testing covariates.
test_Z (array-like) – Testing instrumental variable.

Returns

Estimated moment functions for the test data.

Return type

array-like

_scores_mediated(train_Y, train_D, train_M, train_W, train_X, train_Z, test_Y, test_D, test_M, test_W, test_X, test_Z)[source]

Calculate the scores for the mediated effects.

Parameters

train_Y (array-like) – Training outcome variable.
train_D (array-like) – Training treatment variable.
train_M (array-like) – Training mediator variable.
train_W (array-like) – Training negative control outcome.
train_X (array-like) – Training covariates.
train_Z (array-like) – Training instrumental variable.
test_Y (array-like) – Testing outcome variable.
test_D (array-like) – Testing treatment variable.
test_M (array-like) – Testing mediator variable.
test_W (array-like) – Testing negative control outcome.
test_X (array-like) – Testing covariates.
test_Z (array-like) – Testing instrumental variable.

Returns

Estimated moment functions for the test data.

Return type

array-like

_split_and_estimate()[source]

Split the data and estimate the model for each fold.

Returns: Estimated values, variances, and confidence intervals.
Return type: tuple

dml()[source]

Perform Debiased Machine Learning for Nonparametric Instrumental Variables.

Returns: Estimated values, variances, and confidence intervals.
Return type: tuple

dml_mediated._fun_threshold_alpha(alpha, g)[source]

Auxiliary function for computation of optimal alpha for improvement in overlap: CHIM (Dealing with limited overlap in estimation of average treatment effects, Crump et al., Biometrika, 2009).

Parameters

alpha (float) – Alpha value.
g (array-like) – Input array.

Returns

Result of the threshold function.

Return type

float

dml_mediated._get(opts, key, default)[source]

Retrieve the value associated with ‘key’ in ‘opts’, or return ‘default’ if not present.

Parameters

opts (dict) – Dictionary of options.
key (str) – Key to look up in ‘opts’.
default (any) – Default value to return if ‘key’ is not found.

Returns

Value associated with ‘key’ or ‘default’.

Return type

any

dml_mediated._transform_poly(X, opts)[source]

Transform the input data X using polynomial features.

Parameters

X (array-like) – Input data.
opts (dict) – Options dictionary containing the polynomial degree (‘lin_degree’).

Returns

Transformed data.

Return type

array-like

dml_mediated.toT(a)

Longterm Analysis

Joint/Sequential longterm

Debiased Machine Learning for long-term causal analysis with a joint or sequential estimator (DML-longterm) class. The estimand can be either for a model with a surrogacy assumption (Athey et al., 2020b. [Estimating treatment effects using multiple surrogates: the role of the surrogate score and the surrogate index](https://arxiv.org/abs/1603.09326)) or with a latent unconfounded model (Athey et al., 2020a. [Combining experimental and observational data to estimate treatment effects on long-term outcomes](https://arxiv.org/abs/2006.09676)). The semiparametric efficiency is derived in Chen and Ritzwoller (2023. [Semiparametric estimation of long-term treatment effects](https://doi.org/10.1016/j.jeconom.2023.105545)). The module supports different types of longterm models, cross-validation, kernel density estimation for localization, and confidence interval computation with pointwise or uniform guarantees.

Classes:

DML_longterm: Main class for performing DML for long-term causal analysis with joint/sequential model fitting.

DML_longterm Methods:

__init__: Initialize the DML_longterm instance with data and model configurations.

_calculate_confidence_interval: Calculate confidence intervals for the estimates.

_localization: Perform localization using kernel density estimation.

_nnpivfit_outcome_latent: Fit the outcome model using nonparametric instrumental variables for the latent unconfounded model.

_nnpivfit_outcome_latent_s : Fit the outcome model using nonparametric instrumental variables for the latent unconfounded model sequentially.

_nnpivfit_outcome_surrogacy: Fit the outcome model using nonparametric instrumental variables for the surrogacy model.

_nnpivfit_outcome_surrogacy_s: Fit the outcome model using nonparametric instrumental variables for the surrogacy model sequentially.

_propensity_score_latent: Estimate the propensity score for the latent unconfounded model.

_propensity_score_surrogacy: Estimate the propensity score for the surrogacy model.

_process_fold: Process a single fold for cross-validation.

_split_and_estimate: Split the data and estimate the model for each fold.

dml: Perform Debiased Machine Learning for Nonparametric Instrumental Variables.

class dml_longterm.DML_longterm(Y, D, S, G, X1=None, V=None, v_values=None, include_V=True, ci_type='pointwise', loc_kernel='gau', bw_loc='silverman', estimator='MR', longterm_model='surrogacy', model1=<nnpiv.rkhs.rkhs2iv.RKHS2IVL2 object>, nn_1=False, alpha=0.05, n_folds=5, n_rep=1, random_seed=123, prop_score=sklearn.linear_model.LogisticRegression, CHIM=False, verbose=True, fitargs1=None, opts=None)[source]

Bases: object

Debiased Machine Learning for long-term causal analysis (DML-longterm) class with joint/sequential model fitting.

Parameters

Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
S (array-like) – Surrogate variable.
G (array-like) – Group variable.
X1 (array-like, optional) – Additional covariates.
V (array-like, optional) – Localization covariates.
v_values (array-like, optional) – Values for localization.
include_V (bool, optional) – Include localization covariates in the model.
ci_type (str, optional) – Type of confidence interval (‘pointwise’, ‘uniform’).
loc_kernel (str, optional) – Kernel for localization. Options include ‘gau’, ‘epa’, ‘uni’, ‘tri’, etc.
bw_loc (str, optional) – Bandwidth for localization.
estimator (str, optional) – Estimator type (‘MR’, ‘OR’, ‘hybrid’, ‘IPW’).
longterm_model (str, optional) – Model type for long-term analysis (‘surrogacy’, ‘latent_unconfounded’).
model1 (estimator /(list), optional) – Model for the outcome stage - Can be a joint or sequential estimator; if the latter a list must be given
nn_1 (bool /(list), optional) – Use neural network for the outcome stage.
alpha (float, optional) – Significance level for confidence intervals.
n_folds (int, optional) – Number of folds for estimation.
n_rep (int, optional) – Number of repetitions for estimation.
random_seed (int, optional) – Seed for random number generator.
prop_score (estimator, optional) – Model for propensity score.
CHIM (bool, optional) – Use CHIM method for dealing with limited overlap.
verbose (bool, optional) – Print progress information.
fitargs1 (dict, optional) – Arguments for fitting the outcome stage model.
opts (dict, optional) – Additional options.

_calculate_confidence_interval(theta, theta_var, theta_cov)[source]

Calculate the confidence interval for the given estimates.

Parameters

theta (array-like) – Estimated values.
theta_var (array-like) – Variance of the estimates.
theta_cov (array-like) – Covariance matrix of the estimates.

Returns

Lower and upper bounds of the confidence intervals.

Return type

array-like

_localization(V, v_val, bw)[source]

Perform localization using kernel density estimation.

Parameters

V (array-like) – Localization covariates.
v_val (array-like) – Values for localization.
bw (float) – Bandwidth for localization.

Returns

Weights for localization.

Return type

array-like

_nnpivfit_outcome_latent(train_Y, train_D, train_S, train_X, train_G, test_X, test_S)[source]

Fit the outcome model jointly using nonparametric instrumental variables for the latent unconfounded model.

This method is based on the model proposed in Athey, S.; Chetty, R.; Imbens, G., Combining experimental and observational data to estimate treatment effects on long-term outcomes. arXiv preprint arXiv:2006.09676 (2020).

Parameters

train_Y (array-like) – Training outcome variable.
train_D (array-like) – Training treatment variable.
train_S (array-like) – Training surrogate variable.
train_X (array-like) – Training covariates.
train_G (array-like) – Training group variable.
test_X (array-like) – Testing covariates.
test_S (array-like) – Testing surrogate variable.

Returns

Estimated values for delta_d1_hat, delta_d0_hat, nu_1_hat, nu_0_hat.

Return type

tuple

_nnpivfit_outcome_latent_s(Y, D, S, X, G)[source]

Fit the outcome model sequentially using the latent unconfounded framework.

This method is based on the model proposed in Athey, S.; Chetty, R.; Imbens, G., Combining experimental and observational data to estimate treatment effects on long-term outcomes. arXiv preprint arXiv:2006.09676 (2020).

Parameters

Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
S (array-like) – Surrogate variable.
X (array-like) – Covariates.
G (array-like) – Group indicator.

Returns

Fitted models for treatment and control groups.

Return type

tuple

_nnpivfit_outcome_surrogacy(train_Y, train_D, train_S, train_X, train_G, test_X, test_S)[source]

Fit the outcome model jointly using nonparametric instrumental variables for the surrogacy model.

This method is based on the model proposed in Athey, S., Chetty, R., Imbens, G., Kang, H., 2020b. Estimating treatment effects using multiple surrogates: the role of the surrogate score and the surrogate index. arXiv preprint arXiv:1603.09326.

Parameters

train_Y (array-like) – Training outcome variable.
train_D (array-like) – Training treatment variable.
train_S (array-like) – Training surrogate variable.
train_X (array-like) – Training covariates.
train_G (array-like) – Training group variable.
test_X (array-like) – Testing covariates.
test_S (array-like) – Testing surrogate variable.

Returns

Estimated values for delta_d1_hat, delta_d0_hat, nu_1_hat, nu_0_hat.

Return type

tuple

_nnpivfit_outcome_surrogacy_s(Y, D, S, X, G)[source]

Fit the outcome model sequentially using the surrogacy framework.

This method is based on the model proposed in Athey, S., Chetty, R., Imbens, G., Kang, H., 2020b. Estimating treatment effects using multiple surrogates: the role of the surrogate score and the surrogate index. arXiv preprint arXiv:1603.09326.

Parameters

Y (array-like) – Outcome variable.
D (array-like) – Treatment variable.
S (array-like) – Surrogate variable.
X (array-like) – Covariates.
G (array-like) – Group indicator.

Returns

Fitted models for the outcome.

Return type

tuple

_process_fold(fold_idx, train_data, test_data)[source]

Process a single fold for cross-validation.

Parameters

fold_idx (int) – Fold index.
train_data (tuple) – Training data for the fold.
test_data (tuple) – Testing data for the fold.

Returns

Estimated moment functions for the test data.

Return type

array-like

_propensity_score_latent(S_train, X_train, D_train, G_train, S_test, X_test)[source]

Estimate the propensity score for the latent unconfounded model.

Parameters

S_train (array-like) – Training surrogate variable.
X_train (array-like) – Training covariates.
D_train (array-like) – Training treatment variable.
G_train (array-like) – Training group variable.
S_test (array-like) – Testing surrogate variable.
X_test (array-like) – Testing covariates.

Returns

Estimated propensity scores and threshold alpha.

Return type

tuple

_propensity_score_surrogacy(S_train, X_train, D_train, G_train, S_test, X_test)[source]

Estimate the propensity score for the surrogacy model.

Parameters

S_train (array-like) – Training surrogate variable.
X_train (array-like) – Training covariates.
D_train (array-like) – Training treatment variable.
G_train (array-like) – Training group variable.
S_test (array-like) – Testing surrogate variable.
X_test (array-like) – Testing covariates.

Returns

Estimated propensity scores and threshold alpha.

Return type

tuple

_split_and_estimate()[source]

Split the data and estimate the model for each fold.

Returns: Estimated values, variances, and confidence intervals.
Return type: tuple

dml()[source]

Perform Debiased Machine Learning for Nonparametric Instrumental Variables.

Returns: Estimated values, variances, and confidence intervals.
Return type: tuple

dml_longterm._fun_threshold_alpha(alpha, g)[source]

Auxiliary function for computation of optimal alpha for improvement in overlap: CHIM (Dealing with limited overlap in estimation of average treatment effects, Crump et al., Biometrika, 2009).

Parameters

alpha (float) – Alpha value.
g (array-like) – Input array.

Returns

Result of the threshold function.

Return type

float

dml_longterm._get(opts, key, default)[source]

Retrieve the value associated with ‘key’ in ‘opts’, or return ‘default’ if not present.

Parameters

opts (dict) – Dictionary of options.
key (str) – Key to look up in ‘opts’.
default (any) – Default value to return if ‘key’ is not found.

Returns

Value associated with ‘key’ or ‘default’.

Return type

any

dml_longterm._transform_poly(X, opts)[source]

Transform the input data X using polynomial features.

Parameters

X (array-like) – Input data.
opts (dict) – Options dictionary containing the polynomial degree (‘lin_degree’).

Returns

Transformed data.

Return type

array-like

dml_longterm.toT(a)