Installation & Replication

A Python package for nested NPIV estimation with RKHS, neural networks (AGMM/AGMM2), linear/ensemble baselines, and DML-based semiparametric procedures. The repository also contains scripts to reproduce all simulation tables and empirical figures.

  • Package source: nnpiv/

  • Simulation drivers: simulations/

  • Notebooks (usage & empirical replications): local_notebooks/

1. Installation

The project is PEP 517/518 compliant (pyproject.toml).

1.1. Create and activate an environment

# From repository root
python3.14 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip

For cluster jobs, the Slurm runner keeps using:

module load python/3.13
mamba activate nnpiv_venv

1.2. Install dependencies

# Base requirements (CPU-only friendly, Python 3.14)
pip install -r requirements.txt

# (Optional) If you are on a cluster, you can also use the cluster pin file
# pip install -r requirements_cluster.txt

If you want GPU acceleration for PyTorch, install the wheel that matches your CUDA runtime from PyTorch’s index.

1.3. Install the package

# From the repository root
pip install -e .

This installs nnpiv in editable mode for development and replication.

Alternatively, you can use the following command (deprecated):

python setup.py develop

2. What’s in the box?

  • Core estimators (nnpiv): RKHS (exact & Nyström-approximate), AGMM/AGMM2, linear & ensemble baselines, and semiparametric DML engines (long-term + mediated variants).

  • Simulations (simulations/): Nonparametric experiments (Table 1) and Semiparametric coverage experiments (Table 2), with config files to switch DGP/estimators and Slurm/local runners.

  • Notebooks (local_notebooks/): Usage examples and replication of empirical figures (Project STAR; Job Corps).

3. Quick start (library)

Example: long-term effects via DML + RKHS.

import numpy as np
from sklearn.linear_model import LogisticRegression
from nnpiv.rkhs import ApproxRKHSIVCV
from nnpiv.semiparametrics import DML_longterm

# Toy shapes: Y,D,S,G are (n,1)
Y, D, S, G = [np.random.randn(1000,1) for _ in range(4)]

m1 = ApproxRKHSIVCV(kernel_approx='nystrom', n_components=400,
                    kernel='rbf', gamma=.001, delta_scale='auto',
                    delta_exp=.4, alpha_scales=np.geomspace(1, 10000, 10), cv=10)
m2 = ApproxRKHSIVCV(kernel_approx='nystrom', n_components=400,
                    kernel='rbf', gamma=.001, delta_scale='auto',
                    delta_exp=.4, alpha_scales=np.geomspace(1, 10000, 10), cv=10)

dml = DML_longterm(Y, D, S, G,
                   longterm_model='latent_unconfounded',
                   model1=[m1, m2],
                   n_folds=5, n_rep=1, CHIM=False,
                   prop_score=LogisticRegression(max_iter=2000))
theta, var, ci = dml.dml()
print(theta, var, ci)

4. Reproducing the simulations

4.1. Folder layout

  • simulations/ - config_*.py — configuration files (DGP, estimators, seeds, output paths) - run_simulations_local.sh — canonical local execution script - run_simulations.sbatch — canonical Slurm execution script - submit_simulations.sh — submission helper with resource profiles - sweep_np.py / sweep_sp.py — experiment drivers - ./nonparametric_fit/ — results - ./semiparametric_cov/ — results

4.2. Nonparametric simulations (Table 1)

Run locally:

cd simulations
./run_simulations_local.sh --config config_np_benchmark

Smoke test locally:

cd simulations
./run_simulations_local.sh --config config_np_benchmark --smoke-test

Run on Slurm:

cd simulations
./submit_simulations.sh --profile sapphire --config config_np_benchmark

(Config input can be config_x, config_x.py, or a path like simulations/config_x.py.)

4.3. Semiparametric coverage simulations (Table 2)

Run locally:

cd simulations
./run_simulations_local.sh --config config_sp_benchmark

Smoke test locally:

cd simulations
./run_simulations_local.sh --config config_sp_benchmark --smoke-test

Run on Slurm:

cd simulations
./submit_simulations.sh --profile sapphire --config config_sp_benchmark

4.4. Unified Slurm options

Run all configs as a Slurm array:

cd simulations
./submit_simulations.sh --profile test --all-configs --smoke-test

Run all configs locally:

cd simulations
./run_simulations_local.sh --all-configs --smoke-test

Track per-config runtime locally (printed summary + CSV log):

cd simulations
./run_simulations_local.sh --all-configs --smoke-test --timing-log ./timings_smoke.csv

Override replication count and seed:

cd simulations
./submit_simulations.sh --profile shared --config config_np_nn --n-experiments 50 --seed 999

4.5. Notes on parallelism & threads

To avoid oversubscription with joblib/NumPy/BLAS/OpenMP, we cap native threads to 1. The Slurm scripts already export:

export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export VECLIB_MAXIMUM_THREADS=1
export NUMEXPR_NUM_THREADS=1

Internally, the Python drivers also set threadpoolctl(1) when appropriate.

5. Empirical replications (notebooks)

Replication notebooks are located in local_notebooks/:

  • STAR long-term outcomes — reproduces paper figures (RKHS + NN).

  • Job Corps mediation — DML(mediated) with neural nets and RKHS.

Note

The repository includes paths expecting CSVs under data/. Data might not be redistributed for license reasons.

6. Repository structure

NNPIV/
├─ nnpiv/                    # package
├─ simulations/              # simulation configs + runners
│  ├─ run_simulations_local.sh # Local execution runner
│  ├─ run_simulations.sbatch # Slurm execution runner
│  ├─ submit_simulations.sh  # Slurm submission helper
│  ├─ sweep_np.py            # driver (NP)
│  ├─ sweep_sp.py            # driver (SP)
│  └─ config_*.py            # experiment configs
├─ local_notebooks/          # usage + empirical replications
├─ data/                     # (data; not always distributed)
├─ output/                   # results (created on run)
├─ pyproject.toml
├─ requirements.txt
└─ README.rst

7. Citing

If you use this package, please cite the associated paper and code artifact:

Meza, I., & Singh, R. (2025). Nested Nonparametric Instrumental Variable Regression.
https://doi.org/10.48550/arXiv.2112.14249

8. License

MIT License (see LICENSE.txt).