Cache Stampede Prevention: Singleflight + SWR Playbook (Practical)

2026-02-23 · software

Cache Stampede Prevention: Singleflight + SWR Playbook (Practical)

Date: 2026-02-23
Category: knowledge
Domain: distributed systems / caching reliability

Why this matters

A cache is usually your shock absorber. During a stampede (a.k.a. dogpile), it becomes an amplifier:

In production, stampedes are less about cache correctness and more about load synchronization risk.

Core principle

Don’t rely on one trick. Use a layered defense:

  1. Request collapsing (singleflight) per key
  2. Serve stale while revalidating (SWR)
  3. TTL jitter + probabilistic early refresh to avoid synchronized expiry
  4. Stale-if-error fallback during origin incidents
  5. Hard caps / circuit logic when cache regeneration becomes toxic

Mental model: three timelines per key

For each hot key, reason with three windows:

This avoids “all callers block at TTL boundary” behavior.


Pattern 1) Request collapsing (singleflight)

When 100 identical requests arrive for key K, allow only one loader to run.

This is the highest-leverage first step because it directly cuts duplicated origin work.

Practical rules

Pattern 2) SWR (serve stale while revalidate)

If entry is stale but still within a soft-stale budget, return stale immediately and refresh asynchronously.

Benefits:

Guardrails

Pattern 3) TTL jitter + probabilistic early refresh

If all keys expire at deterministic boundaries, stampedes are inevitable for hot keys.

Use both:

This de-synchronizes regeneration events without centralized locking.

Pattern 4) Stale-if-error fallback

When origin returns transient 5xx/timeout, prefer stale (within bounded limit) over hard failure.

But keep a strict max stale horizon to avoid serving dangerously old data forever.


Recommended architecture (minimal robust stack)

L1. In-process coalescing

L2. Cross-instance protection (optional, high scale)

L3. Cache policy windows

L4. Safety controls


Pseudocode (operational sketch)

onRequest(key):
  entry = cache.get(key)
  now = clock()

  if entry.exists and now < entry.freshUntil:
    return entry.value  // fresh hit

  if entry.exists and now < entry.softStaleUntil:
    triggerBackgroundRefreshWithSingleflight(key)
    return entry.value  // stale served fast

  // hard-stale or miss
  v, err, shared = singleflight.Do(key, loadFromOriginWithDeadline)

  if err == nil:
    cache.put(key, v, ttlWithJitter())
    return v

  if entry.exists and now < entry.errorStaleUntil:
    markFallback("stale_if_error")
    return entry.value

  return error

Metrics that actually matter

Track these per endpoint + key cohort (especially top hot keys):

Alert examples


Rollout plan (safe)

  1. Enable singleflight for one high-traffic read endpoint.
  2. Add SWR window with conservative stale bounds.
  3. Add TTL jitter.
  4. Add stale-if-error only for approved data classes.
  5. Canary probabilistic early refresh on hottest 1–5% keys.
  6. Wire auto-halt if origin budget or error budget is violated.

Success criteria (example):


Common failure modes

  1. Singleflight only in one layer: multi-instance stampede still happens.
  2. No timeout on loader: follower requests inherit pathological waits.
  3. SWR without bounds: stale becomes silent data corruption.
  4. Jitter too small: still synchronized enough to herd.
  5. No incident mode: retries + regeneration overwhelm origin simultaneously.

Decision cheat sheet

The core idea: cache policy is a control system, not a TTL constant.


References (researched)