Raft Linearizable Reads in Practice: ReadIndex vs Lease-Read Playbook
Most teams tune Raft writes first, then get surprised that reads become the new bottleneck.
This guide is for the practical question:
How do we keep read latency low without silently downgrading correctness?
One-Line Intuition
Use ReadIndex as your default linearizable read path, use lease-read only with explicit clock-drift budgets, and expose stale/serializable reads as an intentional product mode—not an accidental fallback.
1) Consistency Ladder (Name It Before You Tune It)
Before optimizing, make your read modes explicit:
- Strict/linearizable read: reflects all writes completed before the read started.
- Serializable/stale-capable read: may return older committed data.
- Historical/snapshot read: read at revision/timestamp
tby contract.
A lot of outages come from “we thought reads were linearizable by default” ambiguity.
2) Why Read Paths Are Tricky in Raft
Writes naturally pass through quorum and commit index advancement.
Reads look easy (“just read leader memory”), but safety depends on proving:
- node is still valid leader for current term,
- read index is at least current committed index,
- local apply index has caught up to safe read index.
Leader changes and clock uncertainty make this non-trivial.
3) Three Read Paths You Should Deliberately Choose From
A. Log-through read (safest, slowest)
- Encode read as a Raft log entry.
- Guarantees linearizability through normal commit/apply pipeline.
- Usually too expensive for high-QPS read-heavy workloads.
Use for: rare admin/metadata reads where simplicity > latency.
B. ReadIndex quorum-confirmed read (recommended default)
Leader asks quorum to confirm leadership (heartbeat round), obtains safe index, then serves read after local apply catches up.
Typical flow:
- receive linearizable read request
- issue
ReadIndexcontext - wait for returned safe index
ri - wait until
applied_index >= ri - execute read from state machine
Pros:
- Linearizable without appending log entry per read
- Works for leader/follower mediated patterns
- Strong default safety profile
Cost:
- Extra coordination RTT component versus local stale read
C. Lease-based read (fastest linearizable path when assumptions hold)
Leader serves read locally within valid lease window.
Pros:
- Lowest latency under stable leadership
- No per-read quorum round trip
Risk:
- Safety depends on bounded clock drift/pauses and lease discipline
- Misconfigured clock assumptions can create stale-yet-looking-fresh reads
Treat lease-read as a governed acceleration mode, not baseline truth.
4) Follower Reads: Throughput Win with Hidden Coordination Cost
Follower-read is often sold as “read scaling.” Correct but incomplete.
For strongly consistent follower reads, follower still needs a safe read point from leader (ReadIndex pattern), then local apply catch-up before serving.
So your real benefit is:
- leader CPU/network relief,
- locality gains (same AZ/region),
- better hotspot distribution,
not “free no-coordination reads.”
5) Practical Latency Model
For linearizable read via ReadIndex:
[ L_{read} \approx L_{queue} + L_{quorum_confirm} + L_{apply_catchup} + L_{state_machine} ]
For lease-read:
[ L_{lease} \approx L_{queue} + L_{state_machine} ]
for reads inside valid lease.
Operator takeaway: if L_apply_catchup spikes, your bottleneck is often apply lag, not Raft messaging.
6) Decision Matrix (Production Defaults)
Control-plane correctness critical (locks, leader election metadata, config)
- default:
ReadIndex - lease-read: only after clock SLO proof
- stale reads: forbid
- default:
User-facing read-heavy APIs needing fresh-enough data with tight p99
- split endpoints:
/read?consistency=linearizable-> ReadIndex/read?consistency=serializable-> stale-capable
- expose latency/freshness tradeoff explicitly
- split endpoints:
Cross-AZ clusters with expensive leader concentration
- enable follower reads with topology-aware selection
- monitor added ReadIndex overhead versus cross-AZ savings
7) Failure Modes That Recur in Real Systems
Silent fallback to stale reads on timeout
- symptom: “latency great, occasional old data bugs”
- fix: timeout -> fail closed for linearizable endpoint
Lease overtrust under clock anomalies
- symptom: rare stale reads during GC pause/NTP issues
- fix: tighten lease margin; route to ReadIndex when clock-health uncertain
Apply lag blind spot
- symptom: ReadIndex succeeds but tail latency explodes
- fix: make
applied_index_gap = read_index - applied_indexfirst-class metric
Follower-read optimism for point queries
- symptom: expected latency win doesn’t materialize
- fix: keep leader reads for tiny queries; use follower-read for larger/batch/locality-heavy workloads
8) Metrics You Actually Need
Track by consistency mode and endpoint class:
read_mode_qps{linearizable|serializable|lease}readindex_rtt_ms(p50/p95/p99)read_apply_wait_msreadindex_to_apply_gapleader_lease_remaining_ms(and lease safety margin)clock_offset_ms,clock_jitter_ms, pause indicatorsfollower_read_fallback_to_leader_rate- stale-read ratio (if exposed intentionally)
Alert examples:
readindex_to_apply_gapsustained above threshold- lease-read enabled while clock-health SLO violated
- linearizable endpoint stale fallback count > 0
9) Rollout Plan (Low-Regret)
- Phase 0: Clarify contracts
- Tag every read endpoint with required consistency.
- Phase 1: ReadIndex baseline
- Move critical reads to ReadIndex path first.
- Phase 2: Observe bottleneck
- Measure quorum RTT vs apply lag contribution.
- Phase 3: Controlled lease-read enablement
- Enable for selected endpoints only when clock health is green.
- Phase 4: Follower-read topology tuning
- Introduce AZ-aware replica selection; monitor fallback and CPU overhead.
- Phase 5: Continuous drift governance
- Auto-degrade lease-read -> ReadIndex on clock or pause anomalies.
10) Minimal Pseudocode Pattern
if request.requires_linearizable:
if lease_read_enabled and lease_is_safe(clock_health, lease_margin):
return local_read()
ri = raft.read_index(ctx)
wait_until(applied_index >= ri)
return local_read()
else:
return serializable_or_snapshot_read()
Key point: make the branch explicit and observable.
11) What “Done Right” Looks Like
You know the read path is mature when:
- product/API contracts name consistency directly,
- linearizable reads never silently degrade,
- lease-read is guarded by measurable clock assumptions,
- follower-read is used where it actually wins (not dogma),
- on-call can answer “why was this read stale/slow?” from metrics alone.
References
- Ongaro, D., Ousterhout, J. (2014). In Search of an Understandable Consensus Algorithm (Extended Version). https://raft.github.io/raft.pdf
- etcd Raft README (features incl. linearizable read-only queries and lease-based options). https://github.com/etcd-io/raft/blob/main/README.md
- etcd Raft source (
ReadOnlySafevsReadOnlyLeaseBasedand clock-drift caveat). https://github.com/etcd-io/raft/blob/main/raft.go - etcd API guarantees (strict serializability, linearizable vs serializable read mode). https://etcd.io/docs/v3.5/learning/api_guarantees/
- TiDB Follower Read docs (ReadIndex-based strong consistency and overhead notes). https://docs.pingcap.com/tidb/stable/follower-read/