HTTP Cache Freshness: `stale-while-revalidate` and `stale-if-error` Production Playbook

Date: 2026-04-09
Category: knowledge
Domain: web performance / CDN / API platform

Why this matters

Most cache incidents are not about "cache miss vs cache hit." They are about what happens right after freshness expires:

the first request blocks on origin,
a burst of traffic dogpiles the same key,
origin is already sick, so revalidation makes things worse,
users see either avoidable latency or avoidable errors.

stale-while-revalidate and stale-if-error give you two different stale-serving windows:

one for hiding revalidation latency,
one for masking bounded origin failures.

Used well, they smooth traffic and improve availability. Used carelessly, they silently serve data that is too old.

What the directives actually mean

`stale-while-revalidate`

From RFC 5861:

the response becomes stale after its freshness lifetime ends,
the cache MAY still serve it stale for an additional window,
and the cache SHOULD revalidate in the background without blocking.

Example:

Cache-Control: max-age=60, stale-while-revalidate=30

Meaning:

0-60s: fresh
60-90s: stale is still allowed while async revalidation happens
90s+: normal stale behavior resumes; next request may block or fetch anew

`stale-if-error`

Also from RFC 5861:

if the cached response is stale,
and revalidation/origin access encounters an error,
the cache MAY serve stale instead of returning a hard error,
but only up to the configured additional window.

Example:

Cache-Control: max-age=60, stale-if-error=600

Meaning:

0-60s: fresh
60-660s: if origin returns an error during reuse/revalidation, stale may be served
660s+: write the error through

RFC 5861 frames error handling around situations that result in 500, 502, 503, or 504, and also motivates the directive with upstream/network failure cases.

Important compatibility fact

Unknown cache directives are generally ignored. So unsupported caches fall back to the rest of the policy (for example, max-age), which makes these directives relatively safe to add incrementally.

Mental model: four windows, not one TTL

Think of a cached object as moving through four states:

Fresh window
- serve normally
Soft-stale window (stale-while-revalidate)
- serve immediately, refresh in background
Error-stale window (stale-if-error)
- only serve stale when origin/revalidation fails
Hard-expired
- fetch/revalidate normally and let errors surface

Timeline example:

0----------------60----------------90----------------660
|     fresh      |   SWR window    |   error-only     |
|                | serve stale +   | stale allowed    |
|                | async refresh   | only on failure  |

Two consequences operators often miss:

stale-while-revalidate does not make content indefinitely self-refreshing. It only helps if a request arrives during that window.
stale-if-error does not increase freshness. It is an availability fallback, not a normal serving mode.

The most useful production pattern: split browser vs CDN policy

Use max-age for private/browser caches and s-maxage for shared caches/CDNs. That lets you keep browsers stricter while allowing the CDN to absorb more origin pressure.

Example: public HTML or JSON that can tolerate brief staleness

Cache-Control: public, max-age=30, s-maxage=300, stale-while-revalidate=30, stale-if-error=600
ETag: "home-v42"

Interpretation:

browsers treat it as fresh for 30s,
shared caches treat it as fresh for 300s,
after shared-cache freshness ends, stale may still be served for 30s while revalidating,
if origin is failing, stale may be used for up to 10 more minutes,
ETag makes revalidation cheap.

This is usually a much better operator default than trying to make every layer share one TTL.

Safe policy templates by content class

1) Versioned static assets (JS/CSS with content hash)

Cache-Control: public, max-age=31536000, immutable

Use this for assets whose URL changes on deploy. You usually do not need SWR here. The better pattern is immutable + cache-busting.

2) Public HTML shell / marketing pages

Cache-Control: public, max-age=60, s-maxage=300, stale-while-revalidate=30, stale-if-error=600
ETag: "landing-v18"

Good when:

content changes a few times a day,
a minute or two of staleness is acceptable,
protecting origin latency matters more than instant freshness.

3) Public list APIs / catalog endpoints

Cache-Control: public, max-age=15, s-maxage=60, stale-while-revalidate=15, stale-if-error=120
ETag: "products-page-1-v9"

Good when:

reads are hot,
data changes regularly,
some stale tolerance exists,
you want CDN shielding during deploys or brief origin wobble.

4) Personalized/authenticated responses

Cache-Control: private, no-store

Or, if local browser caching is intentionally allowed:

Cache-Control: private, max-age=30

Do not lean on SWR/stale-if-error in shared caches for personalized content unless you have extremely explicit safety boundaries. Most of the time, that is how private data leaks happen.

5) High-integrity state (balances, checkout totals, irreversible actions)

Prefer strict freshness:

Cache-Control: no-store

or, if storage is okay but stale reuse is not:

Cache-Control: max-age=0, must-revalidate

This is not where you get clever.

Validators matter more than people think

If you enable stale windows, pair them with strong revalidation signals:

ETag
Last-Modified

Why:

SWR works best when background validation is cheap,
validators let caches ask “did this change?” instead of re-downloading the full object,
CDNs can collapse conditional revalidation better than blind refetches.

Cloudflare’s revalidation docs explicitly describe serving stale during async revalidation and then using If-Modified-Since / ETag to validate cheaply.

Practical rule:

SWR without validators is still useful,
but SWR with validators is what turns it into an efficient operator tool.

CDN behavior is not identical to browser behavior

Do not assume every cache layer behaves the same way.

Browsers

stale-while-revalidate is supported in modern browsers, and unsupported clients generally ignore it.
Browser behavior is still bounded by local cache rules and normal request patterns.
If a service worker intercepts requests, it gets first crack; HTTP cache directives only apply if the request path still goes through normal fetch/cache handling.

Shared caches / CDNs

CDNs are where stale-if-error is usually most valuable.
CDN implementations may add request collapsing, async revalidation, or vendor-specific controls.
Behavior around directives like must-revalidate, no-cache, and stale serving can be vendor-specific.

Cloudflare’s current docs are explicit that asynchronous SWR serves stale immediately with UPDATING while revalidation happens in the background, and that must-revalidate / no-cache can prevent stale serving when Origin Cache Control is enabled.

Operator takeaway:

test policy at the actual cache layer that matters. Do not reason from browser behavior alone.

Common design mistakes

1) Treating SWR as a magic freshness extender

It is not. It is a bounded non-blocking revalidation window. If traffic is sparse and no request lands during that window, the next request can still block.

2) Oversizing `stale-if-error`

stale-if-error=86400 may look comforting, but it can hide major incidents for a full day. Use long windows only when:

content risk is low,
stale data is clearly acceptable,
observability will tell you stale fallback is happening.

3) Combining stale goals with strict revalidation directives carelessly

If you want stale reuse, be careful with:

must-revalidate
no-cache

Some shared-cache/CDN setups will honor those in ways that effectively disable stale serving. If your goal is "serve stale during revalidation," verify that your final policy does not negate itself.

4) Using shared-cache stale policies on personalized content

If a response depends on cookies, account state, or per-user authorization, shared stale serving is dangerous unless you have airtight cache key segmentation and privacy rules.

5) Forgetting purge/version strategy

If you allow objects to stay usable past freshness, you must also know how to invalidate them quickly when needed:

content-hash versioning for static assets,
targeted purge for hot objects,
emergency bypass for bad cache fills.

6) Assuming stale windows solve herd behavior by themselves

They help, but they do not replace request collapsing or singleflight. For very hot keys, use stale controls together with cache stampede mitigation.

How to size the windows

A good starting heuristic:

`max-age`

Set to the amount of staleness you accept in normal operation.

`stale-while-revalidate`

Set to slightly more than:

expected revalidation latency,
plus some burst cushion,
but not so large that you stop noticing freshness problems.

`stale-if-error`

Set to the outage-absorption interval you can safely tolerate for that content class.

Examples:

homepage cards: max-age=60, swr=30, sie=600
product list: max-age=15, swr=15, sie=120
docs page: max-age=300, swr=60, sie=3600

A useful RFC 5861 framing:

max-age + stale-while-revalidate should fit inside the longest total freshness lifetime you can tolerate for normal non-error serving.

Rollout plan

Phase 0 — establish strict baseline

Before adding stale directives, make sure you already know:

current cache hit ratio,
p95/p99 origin latency on cache miss,
conditional revalidation rate,
which endpoints are safe to serve stale.

Phase 1 — add validators

For your candidate endpoints, ensure ETag or Last-Modified exists.

Phase 2 — add small `stale-while-revalidate`

Start with low-risk public content. Example:

Cache-Control: public, max-age=30, s-maxage=60, stale-while-revalidate=15

Phase 3 — add bounded `stale-if-error`

Only for endpoints where availability beats perfect freshness. Keep the first window conservative.

Phase 4 — expand by content class

Different endpoints deserve different stale budgets. Do not copy one header across the whole platform.

Observability checklist

Track these per endpoint family:

cache hit ratio
conditional revalidation rate (304 share, validator usage)
origin fetch rate after expiry
stale serve ratio during SWR
stale serve ratio during errors
masked error rate (how often stale-if-error hid a failure)
content age distribution for served responses
purge frequency and emergency invalidation events

If your CDN exposes cache statuses, break them out. For example, Cloudflare’s UPDATING can reveal active async revalidation behavior.

The key anti-blindness rule:

If you use stale-if-error, alert on stale fallback itself — not just on visible 5xx.

Otherwise, the cache can hide an outage until the stale window runs out.

Decision cheat sheet

Use stale-while-revalidate when:

you want to hide revalidation latency,
the content is read-heavy and public,
a short stale window is acceptable,
you have validators.

Use stale-if-error when:

uptime matters more than perfect freshness,
the content can safely remain visible during brief origin failures,
you have monitoring for masked failures.

Avoid both when:

the response is personalized,
stale data creates legal/financial/user-trust risk,
the system lacks good purge or observability.

Practical recommendation

For most teams, the safest pattern is:

version static assets aggressively,
use max-age + s-maxage to separate browser and CDN policy,
add a small stale-while-revalidate window first,
add a carefully bounded stale-if-error window only for content classes that can tolerate it,
pair stale policies with validators and explicit monitoring.

The goal is not “serve stale more often.” The goal is:

fresh enough in normal times,
fast at expiry boundaries,
graceful during short origin failures.

References

RFC 5861 — HTTP Cache-Control Extensions for Stale Content
https://datatracker.ietf.org/doc/html/rfc5861
MDN — Cache-Control header
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Cache-Control
web.dev — Keeping things fresh with stale-while-revalidate
https://web.dev/articles/stale-while-revalidate
Cloudflare Docs — Revalidation
https://developers.cloudflare.com/cache/concepts/revalidation/

HTTP Cache Freshness: `stale-while-revalidate` and `stale-if-error` Production Playbook

HTTP Cache Freshness: stale-while-revalidate and stale-if-error Production Playbook

Why this matters

What the directives actually mean

stale-while-revalidate

stale-if-error

Important compatibility fact

Mental model: four windows, not one TTL

The most useful production pattern: split browser vs CDN policy

Example: public HTML or JSON that can tolerate brief staleness

Safe policy templates by content class

1) Versioned static assets (JS/CSS with content hash)

2) Public HTML shell / marketing pages

3) Public list APIs / catalog endpoints

4) Personalized/authenticated responses

5) High-integrity state (balances, checkout totals, irreversible actions)

Validators matter more than people think

CDN behavior is not identical to browser behavior

Browsers

Shared caches / CDNs

Common design mistakes

1) Treating SWR as a magic freshness extender

2) Oversizing stale-if-error

3) Combining stale goals with strict revalidation directives carelessly

4) Using shared-cache stale policies on personalized content

5) Forgetting purge/version strategy

6) Assuming stale windows solve herd behavior by themselves

How to size the windows

max-age

stale-while-revalidate

stale-if-error

Rollout plan

Phase 0 — establish strict baseline

Phase 1 — add validators

Phase 2 — add small stale-while-revalidate

Phase 3 — add bounded stale-if-error

Phase 4 — expand by content class

Observability checklist

Decision cheat sheet

Practical recommendation

References

HTTP Cache Freshness: `stale-while-revalidate` and `stale-if-error` Production Playbook

`stale-while-revalidate`

`stale-if-error`

2) Oversizing `stale-if-error`

`max-age`

`stale-while-revalidate`

`stale-if-error`

Phase 2 — add small `stale-while-revalidate`

Phase 3 — add bounded `stale-if-error`