W-TinyLFU Cache Admission Playbook

2026-03-27 · software

W-TinyLFU Cache Admission Playbook

Date: 2026-03-27
Category: knowledge
Audience: backend/platform engineers running in-memory caches under mixed read patterns

1) Why this matters

Many production caches fail for a simple reason: they optimize eviction, but ignore admission.

Classic LRU accepts almost everything on miss, which is great for recency bursts but fragile under:

W-TinyLFU addresses this by splitting the job:

Result: better hit-rate stability without heavy metadata overhead.


2) Ground truth: TinyLFU vs W-TinyLFU

TinyLFU (admission policy)

TinyLFU is not a full replacement policy by itself. It is an admission filter that asks:

“Should this incoming item replace an existing victim?”

It estimates recent-item popularity using an approximate frequency structure (typically Count-Min Sketch style counters).

W-TinyLFU (full policy shape)

W-TinyLFU adds a small LRU window in front of the main space:

  1. New items enter the window.
  2. Window evictions compete with main-space victims.
  3. TinyLFU estimates both frequencies.
  4. Higher estimated value survives.

This keeps recency bursts from being unfairly rejected while still resisting cache pollution.


3) Practical architecture model

Think of the cache as three logical parts:

  1. Window region (LRU-ish):

    • absorbs short-term bursts,
    • protects newly hot items during “probation”.
  2. Main region (often segmented LRU or LFU-biased):

    • stores established survivors,
    • favors longer-term utility.
  3. Frequency sketch (TinyLFU):

    • approximate counts for recent accesses,
    • tiny memory footprint versus exact per-key counters.

In Caffeine’s published design notes, this policy is paired with:


4) Why it usually beats plain LRU in production

W-TinyLFU improves the common failure modes of LRU:

In short: it behaves better when your workload is neither purely recency-biased nor purely frequency-biased.


5) What to measure (minimum observability contract)

Do not roll out by “overall hit rate” alone.

Track at least:

Useful derived signals:


6) Tuning knobs that matter

6.1 Window size share

Use adaptive tuning if available; otherwise tune per workload class.

6.2 Counter budget / sketch size

Pick a budget proportional to active key cardinality, not total key universe.

6.3 Aging / decay cadence

Frequency structures need decay (or reset semantics) so ancient popularity doesn’t dominate forever.

If decay is too slow, cache becomes sticky. If too fast, cache overreacts.

6.4 Cost-awareness

If item sizes vary widely, pure key-count admission can be misleading. Prefer a cost-aware cache when possible.


7) Failure modes and anti-patterns

  1. Treating W-TinyLFU as “set and forget”

    • Symptoms: stale tuning after workload drift.
  2. Ignoring byte hit ratio

    • Easy to over-optimize tiny-object hits while big misses dominate origin load.
  3. No tenant isolation

    • Noisy tenants can dominate admission competition.
  4. Blindly trusting default knobs across services

    • API cache, feature store cache, and metadata cache usually need different tuning.
  5. No warmup strategy after restart

    • Cold sketch + cold data can produce temporary policy instability.

8) Rollout playbook (safe, fast)

  1. Shadow benchmark

    • Replay representative traces through current policy vs W-TinyLFU.
  2. Canary by workload class

    • Separate read-heavy API paths from batch/scan-heavy paths.
  3. Success gates

    • Request hit ratio up,
    • byte hit ratio non-inferior,
    • origin p95/p99 stable or better,
    • no admission-thrash episodes.
  4. Rollback triggers

    • sustained reject spikes + origin latency regression,
    • elevated eviction churn without hit-rate lift,
    • major tenant fairness regressions.
  5. Post-rollout check

    • revalidate after 1 week to capture weekly seasonality.

9) Implementation notes by ecosystem

Java (Caffeine)

Caffeine is the reference production implementation many teams use for W-TinyLFU-style behavior, with extensive design docs and simulator support.

Go (Ristretto)

Ristretto pairs TinyLFU admission with SampledLFU eviction and cost-based controls, optimized for high-concurrency scenarios.

Operationally, both emphasize:


10) Bottom line

If your cache lives in the real world (scans, bursts, mixed tenants), plain LRU is often too fragile.

W-TinyLFU is a pragmatic upgrade because it:

Treat it as a control system (measure -> tune -> verify), not just an algorithm swap.


References