HTTP Cache Freshness: stale-while-revalidate and stale-if-error Production Playbook
Date: 2026-04-09
Category: knowledge
Domain: web performance / CDN / API platform
Why this matters
Most cache incidents are not about "cache miss vs cache hit." They are about what happens right after freshness expires:
- the first request blocks on origin,
- a burst of traffic dogpiles the same key,
- origin is already sick, so revalidation makes things worse,
- users see either avoidable latency or avoidable errors.
stale-while-revalidate and stale-if-error give you two different stale-serving windows:
- one for hiding revalidation latency,
- one for masking bounded origin failures.
Used well, they smooth traffic and improve availability. Used carelessly, they silently serve data that is too old.
What the directives actually mean
stale-while-revalidate
From RFC 5861:
- the response becomes stale after its freshness lifetime ends,
- the cache MAY still serve it stale for an additional window,
- and the cache SHOULD revalidate in the background without blocking.
Example:
Cache-Control: max-age=60, stale-while-revalidate=30
Meaning:
- 0-60s: fresh
- 60-90s: stale is still allowed while async revalidation happens
- 90s+: normal stale behavior resumes; next request may block or fetch anew
stale-if-error
Also from RFC 5861:
- if the cached response is stale,
- and revalidation/origin access encounters an error,
- the cache MAY serve stale instead of returning a hard error,
- but only up to the configured additional window.
Example:
Cache-Control: max-age=60, stale-if-error=600
Meaning:
- 0-60s: fresh
- 60-660s: if origin returns an error during reuse/revalidation, stale may be served
- 660s+: write the error through
RFC 5861 frames error handling around situations that result in 500, 502, 503, or 504, and also motivates the directive with upstream/network failure cases.
Important compatibility fact
Unknown cache directives are generally ignored.
So unsupported caches fall back to the rest of the policy (for example, max-age), which makes these directives relatively safe to add incrementally.
Mental model: four windows, not one TTL
Think of a cached object as moving through four states:
- Fresh window
- serve normally
- Soft-stale window (
stale-while-revalidate)- serve immediately, refresh in background
- Error-stale window (
stale-if-error)- only serve stale when origin/revalidation fails
- Hard-expired
- fetch/revalidate normally and let errors surface
Timeline example:
0----------------60----------------90----------------660
| fresh | SWR window | error-only |
| | serve stale + | stale allowed |
| | async refresh | only on failure |
Two consequences operators often miss:
stale-while-revalidatedoes not make content indefinitely self-refreshing. It only helps if a request arrives during that window.stale-if-errordoes not increase freshness. It is an availability fallback, not a normal serving mode.
The most useful production pattern: split browser vs CDN policy
Use max-age for private/browser caches and s-maxage for shared caches/CDNs.
That lets you keep browsers stricter while allowing the CDN to absorb more origin pressure.
Example: public HTML or JSON that can tolerate brief staleness
Cache-Control: public, max-age=30, s-maxage=300, stale-while-revalidate=30, stale-if-error=600
ETag: "home-v42"
Interpretation:
- browsers treat it as fresh for 30s,
- shared caches treat it as fresh for 300s,
- after shared-cache freshness ends, stale may still be served for 30s while revalidating,
- if origin is failing, stale may be used for up to 10 more minutes,
ETagmakes revalidation cheap.
This is usually a much better operator default than trying to make every layer share one TTL.
Safe policy templates by content class
1) Versioned static assets (JS/CSS with content hash)
Cache-Control: public, max-age=31536000, immutable
Use this for assets whose URL changes on deploy. You usually do not need SWR here. The better pattern is immutable + cache-busting.
2) Public HTML shell / marketing pages
Cache-Control: public, max-age=60, s-maxage=300, stale-while-revalidate=30, stale-if-error=600
ETag: "landing-v18"
Good when:
- content changes a few times a day,
- a minute or two of staleness is acceptable,
- protecting origin latency matters more than instant freshness.
3) Public list APIs / catalog endpoints
Cache-Control: public, max-age=15, s-maxage=60, stale-while-revalidate=15, stale-if-error=120
ETag: "products-page-1-v9"
Good when:
- reads are hot,
- data changes regularly,
- some stale tolerance exists,
- you want CDN shielding during deploys or brief origin wobble.
4) Personalized/authenticated responses
Cache-Control: private, no-store
Or, if local browser caching is intentionally allowed:
Cache-Control: private, max-age=30
Do not lean on SWR/stale-if-error in shared caches for personalized content unless you have extremely explicit safety boundaries. Most of the time, that is how private data leaks happen.
5) High-integrity state (balances, checkout totals, irreversible actions)
Prefer strict freshness:
Cache-Control: no-store
or, if storage is okay but stale reuse is not:
Cache-Control: max-age=0, must-revalidate
This is not where you get clever.
Validators matter more than people think
If you enable stale windows, pair them with strong revalidation signals:
ETagLast-Modified
Why:
- SWR works best when background validation is cheap,
- validators let caches ask “did this change?” instead of re-downloading the full object,
- CDNs can collapse conditional revalidation better than blind refetches.
Cloudflare’s revalidation docs explicitly describe serving stale during async revalidation and then using If-Modified-Since / ETag to validate cheaply.
Practical rule:
- SWR without validators is still useful,
- but SWR with validators is what turns it into an efficient operator tool.
CDN behavior is not identical to browser behavior
Do not assume every cache layer behaves the same way.
Browsers
stale-while-revalidateis supported in modern browsers, and unsupported clients generally ignore it.- Browser behavior is still bounded by local cache rules and normal request patterns.
- If a service worker intercepts requests, it gets first crack; HTTP cache directives only apply if the request path still goes through normal fetch/cache handling.
Shared caches / CDNs
- CDNs are where
stale-if-erroris usually most valuable. - CDN implementations may add request collapsing, async revalidation, or vendor-specific controls.
- Behavior around directives like
must-revalidate,no-cache, and stale serving can be vendor-specific.
Cloudflare’s current docs are explicit that asynchronous SWR serves stale immediately with UPDATING while revalidation happens in the background, and that must-revalidate / no-cache can prevent stale serving when Origin Cache Control is enabled.
Operator takeaway:
test policy at the actual cache layer that matters. Do not reason from browser behavior alone.
Common design mistakes
1) Treating SWR as a magic freshness extender
It is not. It is a bounded non-blocking revalidation window. If traffic is sparse and no request lands during that window, the next request can still block.
2) Oversizing stale-if-error
stale-if-error=86400 may look comforting, but it can hide major incidents for a full day.
Use long windows only when:
- content risk is low,
- stale data is clearly acceptable,
- observability will tell you stale fallback is happening.
3) Combining stale goals with strict revalidation directives carelessly
If you want stale reuse, be careful with:
must-revalidateno-cache
Some shared-cache/CDN setups will honor those in ways that effectively disable stale serving. If your goal is "serve stale during revalidation," verify that your final policy does not negate itself.
4) Using shared-cache stale policies on personalized content
If a response depends on cookies, account state, or per-user authorization, shared stale serving is dangerous unless you have airtight cache key segmentation and privacy rules.
5) Forgetting purge/version strategy
If you allow objects to stay usable past freshness, you must also know how to invalidate them quickly when needed:
- content-hash versioning for static assets,
- targeted purge for hot objects,
- emergency bypass for bad cache fills.
6) Assuming stale windows solve herd behavior by themselves
They help, but they do not replace request collapsing or singleflight. For very hot keys, use stale controls together with cache stampede mitigation.
How to size the windows
A good starting heuristic:
max-age
Set to the amount of staleness you accept in normal operation.
stale-while-revalidate
Set to slightly more than:
- expected revalidation latency,
- plus some burst cushion,
- but not so large that you stop noticing freshness problems.
stale-if-error
Set to the outage-absorption interval you can safely tolerate for that content class.
Examples:
- homepage cards:
max-age=60, swr=30, sie=600 - product list:
max-age=15, swr=15, sie=120 - docs page:
max-age=300, swr=60, sie=3600
A useful RFC 5861 framing:
max-age + stale-while-revalidateshould fit inside the longest total freshness lifetime you can tolerate for normal non-error serving.
Rollout plan
Phase 0 — establish strict baseline
Before adding stale directives, make sure you already know:
- current cache hit ratio,
- p95/p99 origin latency on cache miss,
- conditional revalidation rate,
- which endpoints are safe to serve stale.
Phase 1 — add validators
For your candidate endpoints, ensure ETag or Last-Modified exists.
Phase 2 — add small stale-while-revalidate
Start with low-risk public content. Example:
Cache-Control: public, max-age=30, s-maxage=60, stale-while-revalidate=15
Phase 3 — add bounded stale-if-error
Only for endpoints where availability beats perfect freshness. Keep the first window conservative.
Phase 4 — expand by content class
Different endpoints deserve different stale budgets. Do not copy one header across the whole platform.
Observability checklist
Track these per endpoint family:
- cache hit ratio
- conditional revalidation rate (
304share, validator usage) - origin fetch rate after expiry
- stale serve ratio during SWR
- stale serve ratio during errors
- masked error rate (how often stale-if-error hid a failure)
- content age distribution for served responses
- purge frequency and emergency invalidation events
If your CDN exposes cache statuses, break them out.
For example, Cloudflare’s UPDATING can reveal active async revalidation behavior.
The key anti-blindness rule:
If you use stale-if-error, alert on stale fallback itself — not just on visible 5xx.
Otherwise, the cache can hide an outage until the stale window runs out.
Decision cheat sheet
Use stale-while-revalidate when:
- you want to hide revalidation latency,
- the content is read-heavy and public,
- a short stale window is acceptable,
- you have validators.
Use stale-if-error when:
- uptime matters more than perfect freshness,
- the content can safely remain visible during brief origin failures,
- you have monitoring for masked failures.
Avoid both when:
- the response is personalized,
- stale data creates legal/financial/user-trust risk,
- the system lacks good purge or observability.
Practical recommendation
For most teams, the safest pattern is:
- version static assets aggressively,
- use
max-age+s-maxageto separate browser and CDN policy, - add a small
stale-while-revalidatewindow first, - add a carefully bounded
stale-if-errorwindow only for content classes that can tolerate it, - pair stale policies with validators and explicit monitoring.
The goal is not “serve stale more often.” The goal is:
- fresh enough in normal times,
- fast at expiry boundaries,
- graceful during short origin failures.
References
- RFC 5861 — HTTP Cache-Control Extensions for Stale Content
https://datatracker.ietf.org/doc/html/rfc5861 - MDN — Cache-Control header
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Cache-Control - web.dev — Keeping things fresh with stale-while-revalidate
https://web.dev/articles/stale-while-revalidate - Cloudflare Docs — Revalidation
https://developers.cloudflare.com/cache/concepts/revalidation/