Modern Difference-in-Differences for Staggered Adoption: A Practical Playbook

Date: 2026-03-08
Category: knowledge (causal inference / policy evaluation)

Why this note exists

Classic DiD intuition is simple: compare before/after changes in treated vs control groups.

But in real datasets, treatment often rolls out at different times (staggered adoption), and effects can vary by cohort and event time. In that setup, naive two-way fixed effects (TWFE) can produce hard-to-interpret averages and even misleading dynamics.

This playbook summarizes what changed in the modern DiD literature and how to operationalize it.

Core problem with naive TWFE in staggered designs

When treatment timing varies across units, TWFE does not generally estimate one clean ATT.

Goodman-Bacon shows TWFE becomes a weighted average of many 2x2 DiD comparisons.
Some comparisons use already-treated units as controls for later-treated units.
With heterogeneous effects, weights can be unintuitive (and in some setups effectively harmful for interpretation).

Practical implication: a single TWFE coefficient may mix together comparisons that are not your target estimand.

Event-study gotcha (the common trap)

Researchers often run TWFE event-study regressions with leads/lags.

Sun & Abraham show that with treatment-effect heterogeneity, a coefficient at event time (\ell) can be contaminated by effects from other event times/cohorts. Apparent pre-trends can appear even when they are artifacts of heterogeneity.

Practical implication: a pretty lead/lag plot can still be wrong.

Modern alternatives (what to use instead)

1) Group-time ATT framework (Callaway & Sant’Anna)

Estimate (ATT(g,t)): effect for cohort first treated at time (g), evaluated at time (t).

Advantages:

Transparent building blocks (who, when)
Flexible aggregation (overall ATT, dynamic/event-time ATT, cohort-specific ATT)
Can use regression, IPW, or doubly robust styles
Implemented in R package did

Use when: you want interpretable cohort/time heterogeneity and robust aggregation.

2) Interaction-weighted event-study (Sun & Abraham)

Build event-study effects robust to heterogeneity by using interaction structure rather than naive TWFE lead/lag pooling.

Use when: dynamic treatment timing effects are central and you need clean event-time interpretation.

3) Imputation-style estimators (Borusyak, Jaravel, Spiess)

Estimate untreated potential outcomes for treated observations using untreated/not-yet-treated data, then aggregate treatment effects.

Use when: you want robust and efficient estimation under staggered adoption with heterogeneous effects.

Pre-trends: useful, but do not over-trust

Roth (2022) emphasizes two issues:

Conventional pre-trend tests can have low power.
Conditioning on passing a pre-trend test can distort inference and may worsen bias/coverage.

Practical implication: “didn’t reject pre-trend” is not proof of parallel trends.

A practical analysis workflow

Define estimand first
- Overall ATT?
- Dynamic effects by event time?
- Cohort-specific effects?
Map treatment timing structure
- Never-treated exists?
- All units eventually treated?
- Anticipation likely?
Start with decomposition diagnostics
- If using TWFE for baseline comparison, inspect how much weight comes from each 2x2 contrast.
Estimate with heterogeneity-robust method
- C&S group-time ATT or BJS imputation (or Sun-Abraham for dynamic/event-study focus).
Aggregate transparently
- Report exactly how (ATT(g,t)) was combined.
- Separate overall ATT and event-time ATT.
Inference + sensitivity
- Use simultaneous bands where appropriate.
- Test robustness to control group definition (never-treated vs not-yet-treated), anticipation windows, covariate specification, and sample windows.
Communicate assumptions clearly
- Parallel trends (possibly conditional)
- No interference/spillovers (or discuss violations)
- Treatment timing exogeneity plausibility

Minimal reporting checklist (copy/paste)

Treatment timing histogram by cohort
Clear estimand definition (overall ATT / dynamic ATT / cohort ATT)
Control group definition explicitly stated
Whether anticipation periods are allowed and how handled
Method choice justified (C&S / Sun-Abraham / BJS)
Event-study figure with method-compatible estimator
Simultaneous confidence bands (not only pointwise) when relevant
Sensitivity table (covariates, sample window, control group, anticipation)
Discussion of external validity and potential spillovers

Common failure modes

Using TWFE event-study by default in staggered adoption and reading every lead/lag literally.
Treating pre-trend p-value as a pass/fail truth test.
Reporting one pooled ATT when heterogeneity is the story.
Hiding aggregation rules for cohort-time effects.
No design plot (you need to show who gets treated when).

Tiny R starter (C&S via `did`)

library(did)

att <- att_gt(
  yname = "y",
  tname = "year",
  idname = "id",
  gname = "first_treat",
  xformla = ~ x1 + x2,
  data = df,
  est_method = "dr"   # reg / ipw / dr
)

# Dynamic/event-time aggregation
es <- aggte(att, type = "dynamic")
summary(es)

Bottom line

In staggered-adoption DiD, the biggest upgrade is conceptual:

stop treating TWFE as automatically causal,
define the estimand first,
estimate cohort-time effects with heterogeneity-robust methods,
aggregate transparently,
and treat pre-trend tests as diagnostics, not guarantees.

That turns DiD from a convenience regression into a defensible causal design.

References

Goodman-Bacon, A. (2018/2021), Difference-in-Differences with Variation in Treatment Timing. NBER w25018 / J. Econometrics.
https://www.nber.org/papers/w25018
Callaway, B., Sant’Anna, P. H. C. (2021), Difference-in-Differences with Multiple Time Periods.
https://arxiv.org/abs/1803.09015
Sun, L., Abraham, S. (2021), Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.
https://arxiv.org/abs/1804.05785
Borusyak, K., Jaravel, X., Spiess, J. (2024), Revisiting Event Study Designs: Robust and Efficient Estimation.
https://arxiv.org/abs/2108.12419
Roth, J. (2022), Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends.
https://www.aeaweb.org/articles?id=10.1257/aeri.20210236
Wing, C., Freedman, S., Hollingsworth, A. (2023), Designing Difference in Difference Studies With Staggered Treatment Adoption: Key Concepts and Practical Guidelines. NBER w31842.
https://www.nber.org/papers/w31842
did package (Callaway et al.) documentation and examples.
https://github.com/bcallaway11/did

Modern Difference-in-Differences for Staggered Adoption: A Practical Playbook

Modern Difference-in-Differences for Staggered Adoption: A Practical Playbook

Why this note exists

Core problem with naive TWFE in staggered designs

Event-study gotcha (the common trap)

Modern alternatives (what to use instead)

1) Group-time ATT framework (Callaway & Sant’Anna)

2) Interaction-weighted event-study (Sun & Abraham)

3) Imputation-style estimators (Borusyak, Jaravel, Spiess)

Pre-trends: useful, but do not over-trust

A practical analysis workflow

Minimal reporting checklist (copy/paste)

Common failure modes

Tiny R starter (C&S via did)

Bottom line

References

Tiny R starter (C&S via `did`)