Raft Linearizable Read Path Playbook (ReadIndex vs Lease Read)
Date: 2026-03-23
Category: knowledge
Scope: How to design low-latency Raft read paths without silently violating linearizability.
1) Why this matters
Most teams optimize Raft writes first and discover too late that reads dominate traffic.
If every read goes through full log proposal/commit, latency and throughput suffer. If you “optimize” too aggressively (e.g., local leader read with weak time assumptions), you can return stale state during partitions or clock anomalies.
This playbook gives a practical ladder:
- Log-commit reads (simplest, safest, slowest)
- ReadIndex reads (linearizable, lower write-path overhead)
- Lease reads (fastest, but requires bounded clock/lease assumptions)
- Follower serializable reads (fast but explicitly stale-allowed)
2) Mental model in 90 seconds
A Raft read is safe only if it is ordered after a point where the serving node is still the valid leader for a quorum-known term.
You can establish that in two common ways:
- Quorum confirmation now → ReadIndex-style barrier
- Quorum confirmation recently + unexpired lease → Lease-based read
If neither is true, a local read can be stale.
3) Read path options and tradeoffs
3.1 Log proposal read ("read-as-write")
Mechanism:
- encode read barrier as a no-op proposal,
- wait for commit+apply,
- serve read after apply index passes barrier.
Pros:
- strongest/easiest reasoning,
- no extra clock assumptions.
Cons:
- unnecessary write-path load,
- worst tail latency under write contention.
Use when:
- correctness first and read QPS is low,
- early-stage system where simplicity matters more than p99.
3.2 ReadIndex (quorum-checked linearizable read)
Mechanism (leader-side):
- leader asks/uses heartbeat quorum confirmation,
- obtains safe read index,
- waits until local apply index >= read index,
- serves read locally.
Pros:
- linearizable without appending log entries per read,
- typically much better than read-as-write.
Cons:
- still pays quorum round/coordination cost,
- can suffer under leader isolation or heartbeat instability.
Use when:
- default choice for strict reads in most production clusters.
3.3 Lease-based linearizable read
Mechanism:
- leader serves read locally if leader lease is considered valid (lease derived from recent quorum contact + bounded clock drift assumptions).
Pros:
- best latency/throughput for strict read API.
Cons:
- safety depends on time assumptions (clock monotonicity/drift bounds),
- risky under NTP steps, VM pause, asymmetric partitions.
Use when:
- infra time discipline is strong and monitored,
- you explicitly accept and guard lease assumptions.
3.4 Serializable follower read
Mechanism:
- follower serves local state without quorum coordination.
Pros:
- very low latency, scales read load horizontally.
Cons:
- may be stale relative to latest committed quorum state.
Use when:
- feature tolerates bounded staleness (dashboards, non-critical listings).
4) Recommended production policy
Default policy:
- strict endpoint: ReadIndex path
- optional ultra-low-latency strict endpoint: lease read behind feature flag
- explicitly stale endpoint: follower/serializable read
This keeps semantics clear instead of mixing hidden consistency modes under one API.
5) Implementation blueprint
5.1 Strict read endpoint (safe baseline)
For each strict read request:
- request
readIndextoken from Raft core, - record
requestedReadIndex, - block until state machine
appliedIndex >= requestedReadIndex, - read KV state machine snapshot/view,
- return with metadata (
servedBy,appliedIndex,readIndexLagMs).
Operational guard:
- timeout strict reads if read-index wait exceeds budget (e.g., 200ms/500ms by tier),
- fail closed (retry/redirect), not fail open to stale data.
5.2 Lease-read fast path (optional)
Serve strict read from lease path only when all are true:
leaseValid == true,- local
appliedIndexnot lagging beyond configured bound, - no recent clock anomaly signal,
- no leader-transfer in progress.
Fallback chain:
LeaseRead -> ReadIndex -> (timeout/error policy)
Never fallback from strict endpoint to serializable silently.
5.3 Follower read endpoint (explicitly stale)
Expose separate API mode/header (example):
consistency=serializable
Return metadata:
followerAppliedIndex,leaderCommitIndex(if known),- estimated staleness window.
Make staleness explicit to callers.
6) Lease safety guardrails (practical)
If using lease reads, define a conservative lease budget:
- lease window must be well below election timeout,
- subtract clock skew + pause + network jitter budget.
Example rule-of-thumb:
lease_max <= election_timeout - (2 * worst_rtt) - clock_skew_budget - pause_budget
If computed lease budget is small/negative, disable lease reads and stick to ReadIndex.
Additional hardening:
- monotonic clock source checks,
- alert on clock step events,
- alert on long GC/stop-the-world or VM suspend,
- disable lease fast path automatically on anomaly.
7) What to measure (must-have telemetry)
For read safety/perf, track at least:
raft_read_requests_total{mode=readindex|lease|serializable}raft_read_latency_ms{mode,...}(p50/p95/p99)raft_readindex_roundtrip_ms(quorum confirmation cost)raft_read_apply_wait_ms(read index to applied index wait)raft_applied_commit_gap(commit index - applied index)raft_lease_reject_total{reason=expired|clock_anomaly|transfer|apply_lag}raft_strict_read_timeout_total
If you cannot separate latency by read mode, you cannot tune safely.
8) Common failure modes
A) Hidden consistency downgrade
Symptom:
- “strict” API occasionally returns stale values during stress.
Root cause:
- automatic fallback strict -> serializable on timeout.
Fix:
- remove silent downgrade; return retriable error or redirect to leader.
B) Lease reads during clock anomalies
Symptom:
- rare stale reads around NTP corrections/VM pause.
Root cause:
- lease validity checked without robust time anomaly guardrails.
Fix:
- anomaly detector disables lease path; force ReadIndex.
C) Applied index lag not enforced
Symptom:
- leader has safe read index but serves before state machine catches up.
Fix:
- enforce
appliedIndex >= readIndexgate for every strict read.
D) ReadIndex latency spikes under quorum turbulence
Symptom:
- p99 strict read latency jumps during partial packet loss.
Fix:
- heartbeat stability tuning, network QoS for Raft traffic, tighter timeout tiers, and caller retries with jitter.
9) Rollout plan (low-risk)
- Start with ReadIndex-only strict reads.
- Add full telemetry and strict timeout/error semantics.
- Introduce lease reads for a small traffic slice.
- Continuously compare correctness/perf vs ReadIndex baseline.
- Auto-disable lease path on clock/leadership anomalies.
- Keep serializable mode explicit and opt-in.
10) Practical takeaway
For most teams, ReadIndex is the best default strict-read path: linearizable, simpler than lease risk, and cheaper than read-as-write.
Use lease reads as an optimization layer, not as the foundation. When in doubt, pay a little latency to keep semantics honest.
References
- Raft paper (Ongaro & Ousterhout): https://raft.github.io/raft.pdf
- etcd-io/raft README (ReadOnlySafe / lease-based read features): https://github.com/etcd-io/raft
- etcd API guarantees (linearizable vs serializable): https://etcd.io/docs/v3.5/learning/api_guarantees/