RCU vs Hazard Pointers vs Epoch Reclamation Selection Playbook

Date: 2026-03-17
Category: knowledge

Why this matters

Lock-free or low-lock data structures fail in production less from CAS bugs than from memory reclamation mistakes:

freeing too early (use-after-free),
freeing too late (unbounded memory growth),
or forcing global stalls to reclaim safely.

If you run read-heavy infra (matching gateways, symbol maps, routing tables, session registries), choosing the right reclamation strategy is a first-order latency and reliability decision.

1) Mental model in one paragraph

All three families solve the same problem: a node is logically removed now, but some thread might still hold a pointer to it.

RCU: wait for a grace period where pre-existing readers are gone, then reclaim.
Epoch-based reclamation (EBR): readers “pin” an epoch; retired nodes are reclaimed after enough epoch advances.
Hazard pointers (HP): readers publish exact pointers they are currently touching; reclaim scans hazard sets and frees only unprotected nodes.

The design tension is always: fast reader path vs reclamation latency guarantees under slow/stuck threads.

2) What changes operationally (not just academically)

Dimension	RCU / QSBR	Epoch-based reclamation	Hazard pointers
Reader fast path	Usually the cheapest; can be near-zero overhead in QSBR-style read sections	Cheap, but includes pin/unpin protocol	Heavier: protect/retry loops + hazard slot stores
Sensitivity to stalled readers	High: grace periods can delay reclamation	High: pinned/stalled thread can block old-epoch reclaim	Lower: stalled thread only blocks nodes it protects
Reclamation batching efficiency	Excellent for bulk retire	Excellent when threads progress normally	Moderate; per-retire scans/threshold tuning matter
Memory bound behavior	Can inflate under grace-period lag	Can inflate under pinned participants	Better bounded by hazard slots + retire threshold design
Implementation complexity	Medium (API discipline + grace-period reasoning)	Medium (epoch discipline + pin lifecycle)	High (pointer protection protocol correctness)
Tail-latency risk source	Expedited grace periods may disturb CPUs (IPIs)	Epoch advancement stalls under blocked participants	Retry/scanning overhead on hot read/write paths

Takeaway: there is no universal winner. You are choosing where to pay: reader overhead, memory headroom, or worst-case reclaim latency.

3) When each strategy tends to win

A) Prefer RCU when

workload is strongly read-mostly,
you can enforce short read-side critical sections,
and delayed reclamation batches are acceptable.

Typical fits:

config snapshots,
routing/symbol tables,
pointer-swapped immutable structures.

Practical note: Linux docs emphasize RCU’s performance comes from very cheap reads, and that reclamation requires waiting for a grace period. Expedited grace periods exist but are intentionally more disruptive to reduce latency.

B) Prefer EBR when

you want near-RCU reader performance in user-space,
teams can rigorously manage pin/unpin lifecycle,
and threads are expected to make progress (few long stalls while pinned).

Typical fits:

lock-free queues/maps in service cores with disciplined thread model,
Rust systems using crossbeam-epoch patterns.

Practical note: EBR often gives excellent throughput until one pinned participant goes pathological; then retire queues can balloon.

C) Prefer hazard pointers when

you need stronger reclaim progress despite arbitrary slow readers,
memory headroom is tight and reclaim lag must stay bounded,
or thread scheduling is noisy/untrusted.

Typical fits:

mixed-priority systems,
long or unpredictable reader lifetimes,
components where memory spikes are a direct incident trigger.

Cost: more complicated APIs/protocols and usually higher steady-state reader overhead.

4) Failure modes you will actually see

A) “Phantom memory leak” under RCU/EBR

Symptom:

RSS keeps rising, but alloc/free counts look “balanced.”

Root cause:

reclamation backlog waits on grace period/epoch progress, not allocator bugs.

B) Pinned-thread hostage in EBR

Symptom:

retire lists grow with one worker wedged in pinned scope.

Root cause:

missing unpin on uncommon error path, blocking syscall in pinned section, or scheduler starvation.

C) Hazard-pointer retry storms

Symptom:

CPU rises and tail latency degrades on high-contention read paths.

Root cause:

repeated protect -> validate -> retry loops and aggressive hazard scans.

D) “Fixing latency” with expedited RCU and hurting the rest

Symptom:

one subsystem’s reclaim latency improves, but system-wide jitter increases.

Root cause:

expedited grace periods intentionally trade efficiency for faster completion, potentially disturbing many CPUs.

5) Selection heuristic (fast and practical)

Start with these questions:

Can any reader stall for long/unbounded time?
- Yes → bias toward HP (or isolate that component).
Is read-path overhead budget ultra-tight (single-digit ns concerns)?
- Yes → bias toward RCU/QSBR or EBR.
Is memory headroom tight and bursty backlog unacceptable?
- Yes → bias toward HP, or strict watchdog + bounded pin scopes with EBR.
Can your team reliably enforce API discipline in all code paths?
- No → choose simpler model even if slower; operational correctness beats benchmark wins.

A good default for many teams:

RCU/EBR for read-mostly maps and metadata,
HP for high-risk structures where reader stalls are plausible,
and avoid one-size-fits-all dogma.

6) Instrumentation you should have from day 1

Whatever scheme you pick, export:

retired-node queue length (global + per-thread),
oldest retire age,
grace-period/epoch completion latency (p50/p95/p99),
count of active/pinned readers,
reclaim batch size and reclaim rate,
memory high-watermark tied to reclaim backlog.

Alert on trend, not just absolute value:

backlog slope > 0 for sustained window,
oldest-retired age exceeding reclaim SLO,
pinned reader duration outliers.

Without these, incidents become allocator blame games.

7) Safe rollout plan

Step 1 — Benchmark with adversarial scenarios

Do not benchmark only steady-state throughput. Include:

one intentionally stalled reader thread,
bursty delete workload,
mixed-priority scheduling noise,
and memory-pressure conditions.

Step 2 — Define reclaim SLO explicitly

Example:

“95% of retired nodes reclaimed within 200 ms; 99% within 2 s.”

Then tune to that SLO (thresholds, batch sizes, grace-period policy).

Step 3 — Add hard guards

watchdog for pinned/read-side duration,
circuit-breaker on retire backlog growth,
optional mode switch (e.g., conservative reclaim pacing vs aggressive throughput mode).

Step 4 — Chaos drills

Quarterly test:

freeze one reader thread,
observe memory slope,
verify alerting and runbook effectiveness.

8) Mapping to common ecosystems

Linux kernel RCU docs: emphasize grace-period semantics, cheap reads, and the latency/efficiency tradeoff of expedited grace periods.
Userspace RCU (liburcu): QSBR flavor gives very fast reads but requires explicit quiescent-state reporting (rcu_quiescent_state()), so discipline is mandatory.
Rust crossbeam-epoch: clearly models pinning and deferred reclamation; excellent ergonomics if teams respect pin scope boundaries.
Concurrency Kit: exposes both ck_epoch and ck_hp, making side-by-side evaluation possible in C systems.
Modern C++ direction: <hazard_pointer> appears in C++26 references, reducing “roll-your-own HP” pressure over time.

9) 30-minute incident runbook (memory growth in lock-free subsystem)

Confirm whether growth correlates with retired backlog (not generic heap fragmentation first).
Check active readers/pinned participants and longest duration.
If EBR/RCU backlog is blocked:
- identify offending thread,
- force-safe restart or isolate it,
- shorten critical sections before touching allocator knobs.
If HP overhead storm:
- inspect retry loops and hazard scan thresholds,
- reduce contention hot spots (sharding, read-copy snapshots).
Only after reclaim-path diagnosis, revisit allocator tuning.
Capture pre/post reclaim-latency and backlog metrics; update guardrails.

10) Bottom line

Pick reclamation like an SRE choice, not a paper choice:

RCU/EBR when reader cost is king and thread progress is trustworthy.
Hazard pointers when reclaim progress under hostile scheduling matters more than raw fast-path simplicity.

In production, the winning strategy is the one your team can observe, debug, and keep safe at 3 a.m.

References

Linux Kernel Documentation: RCU Concepts — https://docs.kernel.org/RCU/rcu.html
Linux Kernel Documentation: TREE_RCU Expedited Grace Periods — https://docs.kernel.org/RCU/Design/Expedited-Grace-Periods/Expedited-Grace-Periods.html
Userspace RCU project docs (liburcu flavors, QSBR notes) — https://liburcu.org/
Crossbeam Epoch docs — https://docs.rs/crossbeam-epoch/latest/crossbeam_epoch/
Concurrency Kit overview (ck_epoch, ck_hp) — https://concurrencykit.org/
Debian manpage: ck_epoch_barrier(3) — https://manpages.debian.org/testing/libck-dev/ck_epoch_barrier.3.en.html
M. M. Michael (2004): Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects (IEEE TPDS), DOI: 10.1109/TPDS.2004.8
cppreference: <hazard_pointer> (C++26) — https://en.cppreference.com/w/cpp/header/hazard_pointer.html