HTTP Cache Freshness: `stale-while-revalidate` and `stale-if-error` Production Playbook

2026-04-09 · software

HTTP Cache Freshness: stale-while-revalidate and stale-if-error Production Playbook

Date: 2026-04-09
Category: knowledge
Domain: web performance / CDN / API platform

Why this matters

Most cache incidents are not about "cache miss vs cache hit." They are about what happens right after freshness expires:

stale-while-revalidate and stale-if-error give you two different stale-serving windows:

Used well, they smooth traffic and improve availability. Used carelessly, they silently serve data that is too old.


What the directives actually mean

stale-while-revalidate

From RFC 5861:

Example:

Cache-Control: max-age=60, stale-while-revalidate=30

Meaning:

stale-if-error

Also from RFC 5861:

Example:

Cache-Control: max-age=60, stale-if-error=600

Meaning:

RFC 5861 frames error handling around situations that result in 500, 502, 503, or 504, and also motivates the directive with upstream/network failure cases.

Important compatibility fact

Unknown cache directives are generally ignored. So unsupported caches fall back to the rest of the policy (for example, max-age), which makes these directives relatively safe to add incrementally.


Mental model: four windows, not one TTL

Think of a cached object as moving through four states:

  1. Fresh window
    • serve normally
  2. Soft-stale window (stale-while-revalidate)
    • serve immediately, refresh in background
  3. Error-stale window (stale-if-error)
    • only serve stale when origin/revalidation fails
  4. Hard-expired
    • fetch/revalidate normally and let errors surface

Timeline example:

0----------------60----------------90----------------660
|     fresh      |   SWR window    |   error-only     |
|                | serve stale +   | stale allowed    |
|                | async refresh   | only on failure  |

Two consequences operators often miss:


The most useful production pattern: split browser vs CDN policy

Use max-age for private/browser caches and s-maxage for shared caches/CDNs. That lets you keep browsers stricter while allowing the CDN to absorb more origin pressure.

Example: public HTML or JSON that can tolerate brief staleness

Cache-Control: public, max-age=30, s-maxage=300, stale-while-revalidate=30, stale-if-error=600
ETag: "home-v42"

Interpretation:

This is usually a much better operator default than trying to make every layer share one TTL.


Safe policy templates by content class

1) Versioned static assets (JS/CSS with content hash)

Cache-Control: public, max-age=31536000, immutable

Use this for assets whose URL changes on deploy. You usually do not need SWR here. The better pattern is immutable + cache-busting.

2) Public HTML shell / marketing pages

Cache-Control: public, max-age=60, s-maxage=300, stale-while-revalidate=30, stale-if-error=600
ETag: "landing-v18"

Good when:

3) Public list APIs / catalog endpoints

Cache-Control: public, max-age=15, s-maxage=60, stale-while-revalidate=15, stale-if-error=120
ETag: "products-page-1-v9"

Good when:

4) Personalized/authenticated responses

Cache-Control: private, no-store

Or, if local browser caching is intentionally allowed:

Cache-Control: private, max-age=30

Do not lean on SWR/stale-if-error in shared caches for personalized content unless you have extremely explicit safety boundaries. Most of the time, that is how private data leaks happen.

5) High-integrity state (balances, checkout totals, irreversible actions)

Prefer strict freshness:

Cache-Control: no-store

or, if storage is okay but stale reuse is not:

Cache-Control: max-age=0, must-revalidate

This is not where you get clever.


Validators matter more than people think

If you enable stale windows, pair them with strong revalidation signals:

Why:

Cloudflare’s revalidation docs explicitly describe serving stale during async revalidation and then using If-Modified-Since / ETag to validate cheaply.

Practical rule:


CDN behavior is not identical to browser behavior

Do not assume every cache layer behaves the same way.

Browsers

Shared caches / CDNs

Cloudflare’s current docs are explicit that asynchronous SWR serves stale immediately with UPDATING while revalidation happens in the background, and that must-revalidate / no-cache can prevent stale serving when Origin Cache Control is enabled.

Operator takeaway:

test policy at the actual cache layer that matters. Do not reason from browser behavior alone.


Common design mistakes

1) Treating SWR as a magic freshness extender

It is not. It is a bounded non-blocking revalidation window. If traffic is sparse and no request lands during that window, the next request can still block.

2) Oversizing stale-if-error

stale-if-error=86400 may look comforting, but it can hide major incidents for a full day. Use long windows only when:

3) Combining stale goals with strict revalidation directives carelessly

If you want stale reuse, be careful with:

Some shared-cache/CDN setups will honor those in ways that effectively disable stale serving. If your goal is "serve stale during revalidation," verify that your final policy does not negate itself.

4) Using shared-cache stale policies on personalized content

If a response depends on cookies, account state, or per-user authorization, shared stale serving is dangerous unless you have airtight cache key segmentation and privacy rules.

5) Forgetting purge/version strategy

If you allow objects to stay usable past freshness, you must also know how to invalidate them quickly when needed:

6) Assuming stale windows solve herd behavior by themselves

They help, but they do not replace request collapsing or singleflight. For very hot keys, use stale controls together with cache stampede mitigation.


How to size the windows

A good starting heuristic:

max-age

Set to the amount of staleness you accept in normal operation.

stale-while-revalidate

Set to slightly more than:

stale-if-error

Set to the outage-absorption interval you can safely tolerate for that content class.

Examples:

A useful RFC 5861 framing:


Rollout plan

Phase 0 — establish strict baseline

Before adding stale directives, make sure you already know:

Phase 1 — add validators

For your candidate endpoints, ensure ETag or Last-Modified exists.

Phase 2 — add small stale-while-revalidate

Start with low-risk public content. Example:

Cache-Control: public, max-age=30, s-maxage=60, stale-while-revalidate=15

Phase 3 — add bounded stale-if-error

Only for endpoints where availability beats perfect freshness. Keep the first window conservative.

Phase 4 — expand by content class

Different endpoints deserve different stale budgets. Do not copy one header across the whole platform.


Observability checklist

Track these per endpoint family:

If your CDN exposes cache statuses, break them out. For example, Cloudflare’s UPDATING can reveal active async revalidation behavior.

The key anti-blindness rule:

If you use stale-if-error, alert on stale fallback itself — not just on visible 5xx.

Otherwise, the cache can hide an outage until the stale window runs out.


Decision cheat sheet

Use stale-while-revalidate when:

Use stale-if-error when:

Avoid both when:


Practical recommendation

For most teams, the safest pattern is:

  1. version static assets aggressively,
  2. use max-age + s-maxage to separate browser and CDN policy,
  3. add a small stale-while-revalidate window first,
  4. add a carefully bounded stale-if-error window only for content classes that can tolerate it,
  5. pair stale policies with validators and explicit monitoring.

The goal is not “serve stale more often.” The goal is:


References