Long-term Effect Analysis

Let \(X \in \mathbb{R}^{p}\) be baseline covariates. Let \(D \in \{0, 1\}\) indicate treatment assignment. Let \(M \in \mathbb{R}\) be an intermediate/short-term outcome and \(Y \in \mathbb{R}\) be a long-term outcome. An analyst may wish to measure the effect of \(D\) on \(Y\) (a long-term outcome), yet the experimental sample only includes \(M\) (a short-term outcome).

If the analyst has access to an additional observational sample that includes the long-term outcome, then long-term causal inference is still possible. Specifically, assume the analyst has access to (i) an experimental sample, indicated by \(G=0\), where \((D, M, X)\) are observed; and (ii) an observational sample, indicated by \(G=1\), where \((M, X, Y)\) are observed, and \(D\) is either observed or not. Depending on whether \(D\) is also revealed in the observational sample will give rise to different assumptions that identify the long-term treatment effect. Specifically, the key identifying assumption when we do not observe \(D\) is that the short-term outcome is a statistical surrogate for the long-term outcome, while the identifying assumption for the case when we observe \(D\) is that unobserved confounding is mediated through the short-term outcome in the observational sample. Following Athey et al., 2020b, we refer to these models as Surrogacy model or Latent unconfounded model, respectively (Athey et al., 2020a).

Long-term effect

Formally, define the long-term counterfactual \(\mathbb{E}\left[Y^{(d)}\right]\) as the counterfactual mean outcome for the full population in the thought experiment in which everyone is assigned treatment value \(D=d\).

The long-term effect defined for the experimental or observational subpopulation is similar, introducing the fixed local weighting \(\ell(G)=\mathbb{1}_{G=0} / \mathbb{P}(G=0)\) or \(\ell(G)=\mathbb{1}_{G=1} / \mathbb{P}(G=1)\), respectively.

Note

In addition to the population ATE, the implementation supports conditional long‑term ATEs within the experimental or observational subpopulations: \(\mathbb{E}[Y^{(1)}-Y^{(0)} \mid G=0]\) and \(\mathbb{E}[Y^{(1)}-Y^{(0)} \mid G=1]\). Set sample_G="G=0" or sample_G="G=1" to target these effects; use sample_G="all" for the population ATE. Identification and influence‑function estimators (with the associated nuisance components) for the Surrogacy and Latent‑Unconfounded models are given in Theorem 3.1 and Theorem B.2 of Chen & Ritzwoller (2023).

Surrogacy Model

Define the regression and the conditional distribution

\[\begin{split}\begin{aligned} \gamma_{0}(m, x, g) & = \mathbb{E}[Y \mid M=m, X=x, G=g] \\ \mathbb{P}(m \mid d, x, g) & = \mathbb{P}(M=m \mid D=d, X=x, G=g) \end{aligned}\end{split}\]

the four nuisances associated to the model are

\[\begin{split}\begin{aligned} \nu_{0}(W) & = \int \gamma_{0}(m, X, 1) \mathrm{d} \mathbb{P}(m \mid d, X, 0) \\ \delta_{0}(W) & = \gamma_{0}(M, X, 1) \\ \alpha_{0}(W) & = \frac{\mathbb{1}_{G=1}}{\mathbb{P}(G=1 \mid M, X)} \frac{\mathbb{P}(d \mid M, X, G=0) \mathbb{P}(G=0 \mid M, X)}{\mathbb{P}(d \mid X, G=0) \mathbb{P}(G=0 \mid X)} \\ \eta_{0}(W) & = \frac{\mathbb{1}_{G=0} \mathbb{1}_{D=d}}{\mathbb{P}(d \mid X, G=0) \mathbb{P}(G=0 \mid X)} \end{aligned}\end{split}\]

and the long-term counterfactual is

\[\begin{split}\begin{aligned} \operatorname{LONG}(d) & = \mathbb{E}\left\{\int \gamma_{0}(m, X, 1) \mathrm{d} \mathbb{P}(m \mid d, X, 0)\right\} \\ & =\mathbb{E}\left[\nu_0\left(W\right)+\alpha_0(W)\left\{Y-\delta_0(W)\right\}+\eta_0(W)\left\{\delta_0(W)-\nu_0(W)\right\}\right] \end{aligned}\end{split}\]

Latent Unconfounded Model

When we observe \(D\) in the observational sample, the regression becomes

\[\begin{split}\begin{aligned} \gamma_{0}(m, x, g, d) & = \mathbb{E}[Y \mid M=m, X=x, G=g, D=d] \\ \mathbb{P}(m \mid d, x, g) & = \mathbb{P}(M=m \mid D=d, X=x, G=g) \end{aligned}\end{split}\]

and the nuisances under this model are given by

\[\begin{split}\begin{aligned} \nu_{0}(W) & = \int \gamma_{0}(m, X, 1, d) \mathrm{d} \mathbb{P}(m \mid d, X, 0) \\ \delta_{0}(W) & = \gamma_{0}(M, X, 1, d) \\ \alpha_{0}(W) & = \frac{\mathbb{1}_{G=1}\mathbb{1}_{D=d}}{\mathbb{P}(G=1 \mid M, X, D=d)} \frac{\mathbb{P}(G=0 \mid M, X, D=d)}{\mathbb{P}(D=d \mid X, G=0) \mathbb{P}(G=0 \mid X)} \\ \eta_{0}(W) & = \frac{\mathbb{1}_{G=0} \mathbb{1}_{D=d}}{\mathbb{P}(D=d \mid X, G=0) \mathbb{P}(G=0 \mid X)} \end{aligned}\end{split}\]

The long-term counterfactual is

\[\begin{split}\begin{aligned} \operatorname{LONG}(d) & = \mathbb{E}\left\{\int \gamma_{0}(m, X, 1, d) \mathrm{d} \mathbb{P}(m \mid d, X, 0)\right\} \\ & =\mathbb{E}\left[\nu_0\left(W\right)+\alpha_0(W)\left\{Y-\delta_0(W)\right\}+\eta_0(W)\left\{\delta_0(W)-\nu_0(W)\right\}\right] \end{aligned}\end{split}\]

dml_longterm.DML_longterm(Y, D, S, G[, X1, ...])

Debiased Machine Learning for long-term causal analysis (DML-longterm) class with joint/sequential model fitting.

References