Cache Stampede Prevention: Singleflight + SWR Playbook (Practical)
Date: 2026-02-23
Category: knowledge
Domain: distributed systems / caching reliability
Why this matters
A cache is usually your shock absorber. During a stampede (a.k.a. dogpile), it becomes an amplifier:
- many requests miss at once,
- all hit origin simultaneously,
- origin latency spikes,
- more requests time out and retry,
- failure cascades.
In production, stampedes are less about cache correctness and more about load synchronization risk.
Core principle
Don’t rely on one trick. Use a layered defense:
- Request collapsing (singleflight) per key
- Serve stale while revalidating (SWR)
- TTL jitter + probabilistic early refresh to avoid synchronized expiry
- Stale-if-error fallback during origin incidents
- Hard caps / circuit logic when cache regeneration becomes toxic
Mental model: three timelines per key
For each hot key, reason with three windows:
- Fresh window: serve cached value normally
- Soft-stale window: serve stale immediately, refresh in background
- Hard-stale boundary: block (or fail over) if no safe stale policy remains
This avoids “all callers block at TTL boundary” behavior.
Pattern 1) Request collapsing (singleflight)
When 100 identical requests arrive for key K, allow only one loader to run.
- leader: executes load
- followers: wait and share leader result
This is the highest-leverage first step because it directly cuts duplicated origin work.
Practical rules
- Namespace by normalized cache key (watch query param ordering/casing).
- Enforce a loader deadline (singleflight without timeout can deadlock user latency).
- Emit
shared=true/falsemetrics to observe real suppression ratio. - Add a
forget/evict path for stuck keys after timeout.
Pattern 2) SWR (serve stale while revalidate)
If entry is stale but still within a soft-stale budget, return stale immediately and refresh asynchronously.
Benefits:
- protects user latency during refresh,
- avoids synchronized blocking spikes,
- reduces thundering-herd probability.
Guardrails
- Keep SWR window explicit per endpoint class (e.g., feed > profile > pricing).
- Never apply blindly to high-integrity data (e.g., balances, irreversible states).
- Include staleness age in response metadata for observability/debug.
Pattern 3) TTL jitter + probabilistic early refresh
If all keys expire at deterministic boundaries, stampedes are inevitable for hot keys.
Use both:
- TTL jitter: randomize expiry (e.g., ±10–20%)
- Probabilistic early refresh: as key nears expiry, only some requests trigger refresh
This de-synchronizes regeneration events without centralized locking.
Pattern 4) Stale-if-error fallback
When origin returns transient 5xx/timeout, prefer stale (within bounded limit) over hard failure.
- improves availability,
- prevents retry storms from compounding incident load.
But keep a strict max stale horizon to avoid serving dangerously old data forever.
Recommended architecture (minimal robust stack)
L1. In-process coalescing
- singleflight map keyed by normalized key
- protects each app instance from local duplicate loads
L2. Cross-instance protection (optional, high scale)
- short-lived distributed lock or lease for ultra-hot keys
- back off to stale response if lease unavailable
L3. Cache policy windows
fresh_ttlsoft_stale_ttl(SWR)error_stale_ttl(stale-if-error)- jittered expiry schedule
L4. Safety controls
- regeneration concurrency cap
- origin QPS budget for refreshes
- kill switch: disable refresh and serve stale for bounded interval during incident
Pseudocode (operational sketch)
onRequest(key):
entry = cache.get(key)
now = clock()
if entry.exists and now < entry.freshUntil:
return entry.value // fresh hit
if entry.exists and now < entry.softStaleUntil:
triggerBackgroundRefreshWithSingleflight(key)
return entry.value // stale served fast
// hard-stale or miss
v, err, shared = singleflight.Do(key, loadFromOriginWithDeadline)
if err == nil:
cache.put(key, v, ttlWithJitter())
return v
if entry.exists and now < entry.errorStaleUntil:
markFallback("stale_if_error")
return entry.value
return error
Metrics that actually matter
Track these per endpoint + key cohort (especially top hot keys):
singleflight_shared_ratioorigin_load_qps_from_regenerationstale_served_ratio(normal vs incident)refresh_success_ratep95/p99 latency(hit vs stale vs miss paths)hard_miss_block_rateretry_rateand5xx_rate
Alert examples
- shared ratio drops sharply on hot-key cohort
- refresh-origin QPS breaches budget for N windows
- stale-if-error ratio spikes + 5xx simultaneously
- hard-miss block rate exceeds threshold during deployment window
Rollout plan (safe)
- Enable singleflight for one high-traffic read endpoint.
- Add SWR window with conservative stale bounds.
- Add TTL jitter.
- Add stale-if-error only for approved data classes.
- Canary probabilistic early refresh on hottest 1–5% keys.
- Wire auto-halt if origin budget or error budget is violated.
Success criteria (example):
- 30%+ reduction in regeneration-origin QPS at expiry edges
- p99 latency improvement without increased 5xx
- no sustained stale-over-age violations
Common failure modes
- Singleflight only in one layer: multi-instance stampede still happens.
- No timeout on loader: follower requests inherit pathological waits.
- SWR without bounds: stale becomes silent data corruption.
- Jitter too small: still synchronized enough to herd.
- No incident mode: retries + regeneration overwhelm origin simultaneously.
Decision cheat sheet
- Need fastest win? -> implement per-key singleflight first.
- Latency SLO pain at expiry edges? -> add SWR.
- Periodic synchronized spikes? -> add jitter + early refresh.
- Frequent transient origin failures? -> bounded stale-if-error.
The core idea: cache policy is a control system, not a TTL constant.
References (researched)
- RFC 5861: stale-while-revalidate / stale-if-error semantics
https://datatracker.ietf.org/doc/html/rfc5861 - Go
singleflightpackage docs (duplicate suppression)
https://pkg.go.dev/golang.org/x/sync/singleflight - Cloudflare: request collapsing and revalidation behavior
https://developers.cloudflare.com/cache/concepts/revalidation/ - Cloudflare engineering write-up on probabilistic revalidation
https://blog.cloudflare.com/sometimes-i-cache/ - Caffeine refresh semantics (async refresh, query-triggered eligibility)
https://github.com/ben-manes/caffeine/wiki/Refresh