RCU vs Hazard Pointers vs Epoch Reclamation Selection Playbook
Date: 2026-03-17
Category: knowledge
Why this matters
Lock-free or low-lock data structures fail in production less from CAS bugs than from memory reclamation mistakes:
- freeing too early (use-after-free),
- freeing too late (unbounded memory growth),
- or forcing global stalls to reclaim safely.
If you run read-heavy infra (matching gateways, symbol maps, routing tables, session registries), choosing the right reclamation strategy is a first-order latency and reliability decision.
1) Mental model in one paragraph
All three families solve the same problem: a node is logically removed now, but some thread might still hold a pointer to it.
- RCU: wait for a grace period where pre-existing readers are gone, then reclaim.
- Epoch-based reclamation (EBR): readers “pin” an epoch; retired nodes are reclaimed after enough epoch advances.
- Hazard pointers (HP): readers publish exact pointers they are currently touching; reclaim scans hazard sets and frees only unprotected nodes.
The design tension is always: fast reader path vs reclamation latency guarantees under slow/stuck threads.
2) What changes operationally (not just academically)
| Dimension | RCU / QSBR | Epoch-based reclamation | Hazard pointers |
|---|---|---|---|
| Reader fast path | Usually the cheapest; can be near-zero overhead in QSBR-style read sections | Cheap, but includes pin/unpin protocol | Heavier: protect/retry loops + hazard slot stores |
| Sensitivity to stalled readers | High: grace periods can delay reclamation | High: pinned/stalled thread can block old-epoch reclaim | Lower: stalled thread only blocks nodes it protects |
| Reclamation batching efficiency | Excellent for bulk retire | Excellent when threads progress normally | Moderate; per-retire scans/threshold tuning matter |
| Memory bound behavior | Can inflate under grace-period lag | Can inflate under pinned participants | Better bounded by hazard slots + retire threshold design |
| Implementation complexity | Medium (API discipline + grace-period reasoning) | Medium (epoch discipline + pin lifecycle) | High (pointer protection protocol correctness) |
| Tail-latency risk source | Expedited grace periods may disturb CPUs (IPIs) | Epoch advancement stalls under blocked participants | Retry/scanning overhead on hot read/write paths |
Takeaway: there is no universal winner. You are choosing where to pay: reader overhead, memory headroom, or worst-case reclaim latency.
3) When each strategy tends to win
A) Prefer RCU when
- workload is strongly read-mostly,
- you can enforce short read-side critical sections,
- and delayed reclamation batches are acceptable.
Typical fits:
- config snapshots,
- routing/symbol tables,
- pointer-swapped immutable structures.
Practical note: Linux docs emphasize RCU’s performance comes from very cheap reads, and that reclamation requires waiting for a grace period. Expedited grace periods exist but are intentionally more disruptive to reduce latency.
B) Prefer EBR when
- you want near-RCU reader performance in user-space,
- teams can rigorously manage pin/unpin lifecycle,
- and threads are expected to make progress (few long stalls while pinned).
Typical fits:
- lock-free queues/maps in service cores with disciplined thread model,
- Rust systems using
crossbeam-epochpatterns.
Practical note: EBR often gives excellent throughput until one pinned participant goes pathological; then retire queues can balloon.
C) Prefer hazard pointers when
- you need stronger reclaim progress despite arbitrary slow readers,
- memory headroom is tight and reclaim lag must stay bounded,
- or thread scheduling is noisy/untrusted.
Typical fits:
- mixed-priority systems,
- long or unpredictable reader lifetimes,
- components where memory spikes are a direct incident trigger.
Cost: more complicated APIs/protocols and usually higher steady-state reader overhead.
4) Failure modes you will actually see
A) “Phantom memory leak” under RCU/EBR
Symptom:
- RSS keeps rising, but alloc/free counts look “balanced.”
Root cause:
- reclamation backlog waits on grace period/epoch progress, not allocator bugs.
B) Pinned-thread hostage in EBR
Symptom:
- retire lists grow with one worker wedged in pinned scope.
Root cause:
- missing unpin on uncommon error path, blocking syscall in pinned section, or scheduler starvation.
C) Hazard-pointer retry storms
Symptom:
- CPU rises and tail latency degrades on high-contention read paths.
Root cause:
- repeated
protect -> validate -> retryloops and aggressive hazard scans.
D) “Fixing latency” with expedited RCU and hurting the rest
Symptom:
- one subsystem’s reclaim latency improves, but system-wide jitter increases.
Root cause:
- expedited grace periods intentionally trade efficiency for faster completion, potentially disturbing many CPUs.
5) Selection heuristic (fast and practical)
Start with these questions:
- Can any reader stall for long/unbounded time?
- Yes → bias toward HP (or isolate that component).
- Is read-path overhead budget ultra-tight (single-digit ns concerns)?
- Yes → bias toward RCU/QSBR or EBR.
- Is memory headroom tight and bursty backlog unacceptable?
- Yes → bias toward HP, or strict watchdog + bounded pin scopes with EBR.
- Can your team reliably enforce API discipline in all code paths?
- No → choose simpler model even if slower; operational correctness beats benchmark wins.
A good default for many teams:
- RCU/EBR for read-mostly maps and metadata,
- HP for high-risk structures where reader stalls are plausible,
- and avoid one-size-fits-all dogma.
6) Instrumentation you should have from day 1
Whatever scheme you pick, export:
- retired-node queue length (global + per-thread),
- oldest retire age,
- grace-period/epoch completion latency (p50/p95/p99),
- count of active/pinned readers,
- reclaim batch size and reclaim rate,
- memory high-watermark tied to reclaim backlog.
Alert on trend, not just absolute value:
- backlog slope > 0 for sustained window,
- oldest-retired age exceeding reclaim SLO,
- pinned reader duration outliers.
Without these, incidents become allocator blame games.
7) Safe rollout plan
Step 1 — Benchmark with adversarial scenarios
Do not benchmark only steady-state throughput. Include:
- one intentionally stalled reader thread,
- bursty delete workload,
- mixed-priority scheduling noise,
- and memory-pressure conditions.
Step 2 — Define reclaim SLO explicitly
Example:
- “95% of retired nodes reclaimed within 200 ms; 99% within 2 s.”
Then tune to that SLO (thresholds, batch sizes, grace-period policy).
Step 3 — Add hard guards
- watchdog for pinned/read-side duration,
- circuit-breaker on retire backlog growth,
- optional mode switch (e.g., conservative reclaim pacing vs aggressive throughput mode).
Step 4 — Chaos drills
Quarterly test:
- freeze one reader thread,
- observe memory slope,
- verify alerting and runbook effectiveness.
8) Mapping to common ecosystems
- Linux kernel RCU docs: emphasize grace-period semantics, cheap reads, and the latency/efficiency tradeoff of expedited grace periods.
- Userspace RCU (liburcu): QSBR flavor gives very fast reads but requires explicit quiescent-state reporting (
rcu_quiescent_state()), so discipline is mandatory. - Rust
crossbeam-epoch: clearly models pinning and deferred reclamation; excellent ergonomics if teams respect pin scope boundaries. - Concurrency Kit: exposes both
ck_epochandck_hp, making side-by-side evaluation possible in C systems. - Modern C++ direction:
<hazard_pointer>appears in C++26 references, reducing “roll-your-own HP” pressure over time.
9) 30-minute incident runbook (memory growth in lock-free subsystem)
- Confirm whether growth correlates with retired backlog (not generic heap fragmentation first).
- Check active readers/pinned participants and longest duration.
- If EBR/RCU backlog is blocked:
- identify offending thread,
- force-safe restart or isolate it,
- shorten critical sections before touching allocator knobs.
- If HP overhead storm:
- inspect retry loops and hazard scan thresholds,
- reduce contention hot spots (sharding, read-copy snapshots).
- Only after reclaim-path diagnosis, revisit allocator tuning.
- Capture pre/post reclaim-latency and backlog metrics; update guardrails.
10) Bottom line
Pick reclamation like an SRE choice, not a paper choice:
- RCU/EBR when reader cost is king and thread progress is trustworthy.
- Hazard pointers when reclaim progress under hostile scheduling matters more than raw fast-path simplicity.
In production, the winning strategy is the one your team can observe, debug, and keep safe at 3 a.m.
References
- Linux Kernel Documentation: RCU Concepts — https://docs.kernel.org/RCU/rcu.html
- Linux Kernel Documentation: TREE_RCU Expedited Grace Periods — https://docs.kernel.org/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html
- Userspace RCU project docs (liburcu flavors, QSBR notes) — https://liburcu.org/
- Crossbeam Epoch docs — https://docs.rs/crossbeam-epoch/latest/crossbeam_epoch/
- Concurrency Kit overview (
ck_epoch,ck_hp) — https://concurrencykit.org/ - Debian manpage:
ck_epoch_barrier(3)— https://manpages.debian.org/testing/libck-dev/ck_epoch_barrier.3.en.html - M. M. Michael (2004): Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects (IEEE TPDS), DOI: 10.1109/TPDS.2004.8
- cppreference:
<hazard_pointer>(C++26) — https://en.cppreference.com/w/cpp/header/hazard_pointer.html