Cache Stampede Prevention: Singleflight + SWR Playbook (Practical)

Date: 2026-02-23
Category: knowledge
Domain: distributed systems / caching reliability

Why this matters

A cache is usually your shock absorber. During a stampede (a.k.a. dogpile), it becomes an amplifier:

many requests miss at once,
all hit origin simultaneously,
origin latency spikes,
more requests time out and retry,
failure cascades.

In production, stampedes are less about cache correctness and more about load synchronization risk.

Core principle

Don’t rely on one trick. Use a layered defense:

Request collapsing (singleflight) per key
Serve stale while revalidating (SWR)
TTL jitter + probabilistic early refresh to avoid synchronized expiry
Stale-if-error fallback during origin incidents
Hard caps / circuit logic when cache regeneration becomes toxic

Mental model: three timelines per key

For each hot key, reason with three windows:

Fresh window: serve cached value normally
Soft-stale window: serve stale immediately, refresh in background
Hard-stale boundary: block (or fail over) if no safe stale policy remains

This avoids “all callers block at TTL boundary” behavior.

Pattern 1) Request collapsing (singleflight)

When 100 identical requests arrive for key K, allow only one loader to run.

leader: executes load
followers: wait and share leader result

This is the highest-leverage first step because it directly cuts duplicated origin work.

Practical rules

Namespace by normalized cache key (watch query param ordering/casing).
Enforce a loader deadline (singleflight without timeout can deadlock user latency).
Emit shared=true/false metrics to observe real suppression ratio.
Add a forget/evict path for stuck keys after timeout.

Pattern 2) SWR (serve stale while revalidate)

If entry is stale but still within a soft-stale budget, return stale immediately and refresh asynchronously.

Benefits:

protects user latency during refresh,
avoids synchronized blocking spikes,
reduces thundering-herd probability.

Guardrails

Keep SWR window explicit per endpoint class (e.g., feed > profile > pricing).
Never apply blindly to high-integrity data (e.g., balances, irreversible states).
Include staleness age in response metadata for observability/debug.

Pattern 3) TTL jitter + probabilistic early refresh

If all keys expire at deterministic boundaries, stampedes are inevitable for hot keys.

Use both:

TTL jitter: randomize expiry (e.g., ±10–20%)
Probabilistic early refresh: as key nears expiry, only some requests trigger refresh

This de-synchronizes regeneration events without centralized locking.

Pattern 4) Stale-if-error fallback

When origin returns transient 5xx/timeout, prefer stale (within bounded limit) over hard failure.

improves availability,
prevents retry storms from compounding incident load.

But keep a strict max stale horizon to avoid serving dangerously old data forever.

Recommended architecture (minimal robust stack)

L1. In-process coalescing

singleflight map keyed by normalized key
protects each app instance from local duplicate loads

L2. Cross-instance protection (optional, high scale)

short-lived distributed lock or lease for ultra-hot keys
back off to stale response if lease unavailable

L3. Cache policy windows

fresh_ttl
soft_stale_ttl (SWR)
error_stale_ttl (stale-if-error)
jittered expiry schedule

L4. Safety controls

regeneration concurrency cap
origin QPS budget for refreshes
kill switch: disable refresh and serve stale for bounded interval during incident

Pseudocode (operational sketch)

onRequest(key):
  entry = cache.get(key)
  now = clock()

  if entry.exists and now < entry.freshUntil:
    return entry.value  // fresh hit

  if entry.exists and now < entry.softStaleUntil:
    triggerBackgroundRefreshWithSingleflight(key)
    return entry.value  // stale served fast

  // hard-stale or miss
  v, err, shared = singleflight.Do(key, loadFromOriginWithDeadline)

  if err == nil:
    cache.put(key, v, ttlWithJitter())
    return v

  if entry.exists and now < entry.errorStaleUntil:
    markFallback("stale_if_error")
    return entry.value

  return error

Metrics that actually matter

Track these per endpoint + key cohort (especially top hot keys):

singleflight_shared_ratio
origin_load_qps_from_regeneration
stale_served_ratio (normal vs incident)
refresh_success_rate
p95/p99 latency (hit vs stale vs miss paths)
hard_miss_block_rate
retry_rate and 5xx_rate

Alert examples

shared ratio drops sharply on hot-key cohort
refresh-origin QPS breaches budget for N windows
stale-if-error ratio spikes + 5xx simultaneously
hard-miss block rate exceeds threshold during deployment window

Rollout plan (safe)

Enable singleflight for one high-traffic read endpoint.
Add SWR window with conservative stale bounds.
Add TTL jitter.
Add stale-if-error only for approved data classes.
Canary probabilistic early refresh on hottest 1–5% keys.
Wire auto-halt if origin budget or error budget is violated.

Success criteria (example):

30%+ reduction in regeneration-origin QPS at expiry edges
p99 latency improvement without increased 5xx
no sustained stale-over-age violations

Common failure modes

Singleflight only in one layer: multi-instance stampede still happens.
No timeout on loader: follower requests inherit pathological waits.
SWR without bounds: stale becomes silent data corruption.
Jitter too small: still synchronized enough to herd.
No incident mode: retries + regeneration overwhelm origin simultaneously.

Decision cheat sheet

Need fastest win? -> implement per-key singleflight first.
Latency SLO pain at expiry edges? -> add SWR.
Periodic synchronized spikes? -> add jitter + early refresh.
Frequent transient origin failures? -> bounded stale-if-error.

The core idea: cache policy is a control system, not a TTL constant.

References (researched)

RFC 5861: stale-while-revalidate / stale-if-error semantics
https://datatracker.ietf.org/doc/html/rfc5861
Go singleflight package docs (duplicate suppression)
https://pkg.go.dev/golang.org/x/sync/singleflight
Cloudflare: request collapsing and revalidation behavior
https://developers.cloudflare.com/cache/concepts/revalidation/
Cloudflare engineering write-up on probabilistic revalidation
https://blog.cloudflare.com/sometimes-i-cache/
Caffeine refresh semantics (async refresh, query-triggered eligibility)
https://github.com/ben-manes/caffeine/wiki/Refresh