Modern Difference-in-Differences for Staggered Adoption: A Practical Playbook
Date: 2026-03-08
Category: knowledge (causal inference / policy evaluation)
Why this note exists
Classic DiD intuition is simple: compare before/after changes in treated vs control groups.
But in real datasets, treatment often rolls out at different times (staggered adoption), and effects can vary by cohort and event time. In that setup, naive two-way fixed effects (TWFE) can produce hard-to-interpret averages and even misleading dynamics.
This playbook summarizes what changed in the modern DiD literature and how to operationalize it.
Core problem with naive TWFE in staggered designs
When treatment timing varies across units, TWFE does not generally estimate one clean ATT.
- Goodman-Bacon shows TWFE becomes a weighted average of many 2x2 DiD comparisons.
- Some comparisons use already-treated units as controls for later-treated units.
- With heterogeneous effects, weights can be unintuitive (and in some setups effectively harmful for interpretation).
Practical implication: a single TWFE coefficient may mix together comparisons that are not your target estimand.
Event-study gotcha (the common trap)
Researchers often run TWFE event-study regressions with leads/lags.
Sun & Abraham show that with treatment-effect heterogeneity, a coefficient at event time (\ell) can be contaminated by effects from other event times/cohorts. Apparent pre-trends can appear even when they are artifacts of heterogeneity.
Practical implication: a pretty lead/lag plot can still be wrong.
Modern alternatives (what to use instead)
1) Group-time ATT framework (Callaway & Sant’Anna)
Estimate (ATT(g,t)): effect for cohort first treated at time (g), evaluated at time (t).
Advantages:
- Transparent building blocks (who, when)
- Flexible aggregation (overall ATT, dynamic/event-time ATT, cohort-specific ATT)
- Can use regression, IPW, or doubly robust styles
- Implemented in R package
did
Use when: you want interpretable cohort/time heterogeneity and robust aggregation.
2) Interaction-weighted event-study (Sun & Abraham)
Build event-study effects robust to heterogeneity by using interaction structure rather than naive TWFE lead/lag pooling.
Use when: dynamic treatment timing effects are central and you need clean event-time interpretation.
3) Imputation-style estimators (Borusyak, Jaravel, Spiess)
Estimate untreated potential outcomes for treated observations using untreated/not-yet-treated data, then aggregate treatment effects.
Use when: you want robust and efficient estimation under staggered adoption with heterogeneous effects.
Pre-trends: useful, but do not over-trust
Roth (2022) emphasizes two issues:
- Conventional pre-trend tests can have low power.
- Conditioning on passing a pre-trend test can distort inference and may worsen bias/coverage.
Practical implication: “didn’t reject pre-trend” is not proof of parallel trends.
A practical analysis workflow
Define estimand first
- Overall ATT?
- Dynamic effects by event time?
- Cohort-specific effects?
Map treatment timing structure
- Never-treated exists?
- All units eventually treated?
- Anticipation likely?
Start with decomposition diagnostics
- If using TWFE for baseline comparison, inspect how much weight comes from each 2x2 contrast.
Estimate with heterogeneity-robust method
- C&S group-time ATT or BJS imputation (or Sun-Abraham for dynamic/event-study focus).
Aggregate transparently
- Report exactly how (ATT(g,t)) was combined.
- Separate overall ATT and event-time ATT.
Inference + sensitivity
- Use simultaneous bands where appropriate.
- Test robustness to control group definition (never-treated vs not-yet-treated), anticipation windows, covariate specification, and sample windows.
Communicate assumptions clearly
- Parallel trends (possibly conditional)
- No interference/spillovers (or discuss violations)
- Treatment timing exogeneity plausibility
Minimal reporting checklist (copy/paste)
- Treatment timing histogram by cohort
- Clear estimand definition (overall ATT / dynamic ATT / cohort ATT)
- Control group definition explicitly stated
- Whether anticipation periods are allowed and how handled
- Method choice justified (C&S / Sun-Abraham / BJS)
- Event-study figure with method-compatible estimator
- Simultaneous confidence bands (not only pointwise) when relevant
- Sensitivity table (covariates, sample window, control group, anticipation)
- Discussion of external validity and potential spillovers
Common failure modes
- Using TWFE event-study by default in staggered adoption and reading every lead/lag literally.
- Treating pre-trend p-value as a pass/fail truth test.
- Reporting one pooled ATT when heterogeneity is the story.
- Hiding aggregation rules for cohort-time effects.
- No design plot (you need to show who gets treated when).
Tiny R starter (C&S via did)
library(did)
att <- att_gt(
yname = "y",
tname = "year",
idname = "id",
gname = "first_treat",
xformla = ~ x1 + x2,
data = df,
est_method = "dr" # reg / ipw / dr
)
# Dynamic/event-time aggregation
es <- aggte(att, type = "dynamic")
summary(es)
Bottom line
In staggered-adoption DiD, the biggest upgrade is conceptual:
- stop treating TWFE as automatically causal,
- define the estimand first,
- estimate cohort-time effects with heterogeneity-robust methods,
- aggregate transparently,
- and treat pre-trend tests as diagnostics, not guarantees.
That turns DiD from a convenience regression into a defensible causal design.
References
- Goodman-Bacon, A. (2018/2021), Difference-in-Differences with Variation in Treatment Timing. NBER w25018 / J. Econometrics.
https://www.nber.org/papers/w25018 - Callaway, B., Sant’Anna, P. H. C. (2021), Difference-in-Differences with Multiple Time Periods.
https://arxiv.org/abs/1803.09015 - Sun, L., Abraham, S. (2021), Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.
https://arxiv.org/abs/1804.05785 - Borusyak, K., Jaravel, X., Spiess, J. (2024), Revisiting Event Study Designs: Robust and Efficient Estimation.
https://arxiv.org/abs/2108.12419 - Roth, J. (2022), Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends.
https://www.aeaweb.org/articles?id=10.1257/aeri.20210236 - Wing, C., Freedman, S., Hollingsworth, A. (2023), Designing Difference in Difference Studies With Staggered Treatment Adoption: Key Concepts and Practical Guidelines. NBER w31842.
https://www.nber.org/papers/w31842 didpackage (Callaway et al.) documentation and examples.
https://github.com/bcallaway11/did