Modern Difference-in-Differences for Staggered Adoption: A Practical Playbook

2026-03-08 · computation

Modern Difference-in-Differences for Staggered Adoption: A Practical Playbook

Date: 2026-03-08
Category: knowledge (causal inference / policy evaluation)

Why this note exists

Classic DiD intuition is simple: compare before/after changes in treated vs control groups.

But in real datasets, treatment often rolls out at different times (staggered adoption), and effects can vary by cohort and event time. In that setup, naive two-way fixed effects (TWFE) can produce hard-to-interpret averages and even misleading dynamics.

This playbook summarizes what changed in the modern DiD literature and how to operationalize it.


Core problem with naive TWFE in staggered designs

When treatment timing varies across units, TWFE does not generally estimate one clean ATT.

Practical implication: a single TWFE coefficient may mix together comparisons that are not your target estimand.


Event-study gotcha (the common trap)

Researchers often run TWFE event-study regressions with leads/lags.

Sun & Abraham show that with treatment-effect heterogeneity, a coefficient at event time (\ell) can be contaminated by effects from other event times/cohorts. Apparent pre-trends can appear even when they are artifacts of heterogeneity.

Practical implication: a pretty lead/lag plot can still be wrong.


Modern alternatives (what to use instead)

1) Group-time ATT framework (Callaway & Sant’Anna)

Estimate (ATT(g,t)): effect for cohort first treated at time (g), evaluated at time (t).

Advantages:

Use when: you want interpretable cohort/time heterogeneity and robust aggregation.

2) Interaction-weighted event-study (Sun & Abraham)

Build event-study effects robust to heterogeneity by using interaction structure rather than naive TWFE lead/lag pooling.

Use when: dynamic treatment timing effects are central and you need clean event-time interpretation.

3) Imputation-style estimators (Borusyak, Jaravel, Spiess)

Estimate untreated potential outcomes for treated observations using untreated/not-yet-treated data, then aggregate treatment effects.

Use when: you want robust and efficient estimation under staggered adoption with heterogeneous effects.


Pre-trends: useful, but do not over-trust

Roth (2022) emphasizes two issues:

  1. Conventional pre-trend tests can have low power.
  2. Conditioning on passing a pre-trend test can distort inference and may worsen bias/coverage.

Practical implication: “didn’t reject pre-trend” is not proof of parallel trends.


A practical analysis workflow

  1. Define estimand first

    • Overall ATT?
    • Dynamic effects by event time?
    • Cohort-specific effects?
  2. Map treatment timing structure

    • Never-treated exists?
    • All units eventually treated?
    • Anticipation likely?
  3. Start with decomposition diagnostics

    • If using TWFE for baseline comparison, inspect how much weight comes from each 2x2 contrast.
  4. Estimate with heterogeneity-robust method

    • C&S group-time ATT or BJS imputation (or Sun-Abraham for dynamic/event-study focus).
  5. Aggregate transparently

    • Report exactly how (ATT(g,t)) was combined.
    • Separate overall ATT and event-time ATT.
  6. Inference + sensitivity

    • Use simultaneous bands where appropriate.
    • Test robustness to control group definition (never-treated vs not-yet-treated), anticipation windows, covariate specification, and sample windows.
  7. Communicate assumptions clearly

    • Parallel trends (possibly conditional)
    • No interference/spillovers (or discuss violations)
    • Treatment timing exogeneity plausibility

Minimal reporting checklist (copy/paste)


Common failure modes

  1. Using TWFE event-study by default in staggered adoption and reading every lead/lag literally.
  2. Treating pre-trend p-value as a pass/fail truth test.
  3. Reporting one pooled ATT when heterogeneity is the story.
  4. Hiding aggregation rules for cohort-time effects.
  5. No design plot (you need to show who gets treated when).

Tiny R starter (C&S via did)

library(did)

att <- att_gt(
  yname = "y",
  tname = "year",
  idname = "id",
  gname = "first_treat",
  xformla = ~ x1 + x2,
  data = df,
  est_method = "dr"   # reg / ipw / dr
)

# Dynamic/event-time aggregation
es <- aggte(att, type = "dynamic")
summary(es)

Bottom line

In staggered-adoption DiD, the biggest upgrade is conceptual:

That turns DiD from a convenience regression into a defensible causal design.


References