Raft Linearizable Read Path Playbook (ReadIndex vs Lease Read)

Date: 2026-03-23
Category: knowledge
Scope: How to design low-latency Raft read paths without silently violating linearizability.

1) Why this matters

Most teams optimize Raft writes first and discover too late that reads dominate traffic.

If every read goes through full log proposal/commit, latency and throughput suffer. If you “optimize” too aggressively (e.g., local leader read with weak time assumptions), you can return stale state during partitions or clock anomalies.

This playbook gives a practical ladder:

Log-commit reads (simplest, safest, slowest)
ReadIndex reads (linearizable, lower write-path overhead)
Lease reads (fastest, but requires bounded clock/lease assumptions)
Follower serializable reads (fast but explicitly stale-allowed)

2) Mental model in 90 seconds

A Raft read is safe only if it is ordered after a point where the serving node is still the valid leader for a quorum-known term.

You can establish that in two common ways:

Quorum confirmation now → ReadIndex-style barrier
Quorum confirmation recently + unexpired lease → Lease-based read

If neither is true, a local read can be stale.

3) Read path options and tradeoffs

3.1 Log proposal read ("read-as-write")

Mechanism:

encode read barrier as a no-op proposal,
wait for commit+apply,
serve read after apply index passes barrier.

Pros:

strongest/easiest reasoning,
no extra clock assumptions.

Cons:

unnecessary write-path load,
worst tail latency under write contention.

Use when:

correctness first and read QPS is low,
early-stage system where simplicity matters more than p99.

3.2 ReadIndex (quorum-checked linearizable read)

Mechanism (leader-side):

leader asks/uses heartbeat quorum confirmation,
obtains safe read index,
waits until local apply index >= read index,
serves read locally.

Pros:

linearizable without appending log entries per read,
typically much better than read-as-write.

Cons:

still pays quorum round/coordination cost,
can suffer under leader isolation or heartbeat instability.

Use when:

default choice for strict reads in most production clusters.

3.3 Lease-based linearizable read

Mechanism:

leader serves read locally if leader lease is considered valid (lease derived from recent quorum contact + bounded clock drift assumptions).

Pros:

best latency/throughput for strict read API.

Cons:

safety depends on time assumptions (clock monotonicity/drift bounds),
risky under NTP steps, VM pause, asymmetric partitions.

Use when:

infra time discipline is strong and monitored,
you explicitly accept and guard lease assumptions.

3.4 Serializable follower read

Mechanism:

follower serves local state without quorum coordination.

Pros:

very low latency, scales read load horizontally.

Cons:

may be stale relative to latest committed quorum state.

Use when:

feature tolerates bounded staleness (dashboards, non-critical listings).

4) Recommended production policy

Default policy:

strict endpoint: ReadIndex path
optional ultra-low-latency strict endpoint: lease read behind feature flag
explicitly stale endpoint: follower/serializable read

This keeps semantics clear instead of mixing hidden consistency modes under one API.

5) Implementation blueprint

5.1 Strict read endpoint (safe baseline)

For each strict read request:

request readIndex token from Raft core,
record requestedReadIndex,
block until state machine appliedIndex >= requestedReadIndex,
read KV state machine snapshot/view,
return with metadata (servedBy, appliedIndex, readIndexLagMs).

Operational guard:

timeout strict reads if read-index wait exceeds budget (e.g., 200ms/500ms by tier),
fail closed (retry/redirect), not fail open to stale data.

5.2 Lease-read fast path (optional)

Serve strict read from lease path only when all are true:

leaseValid == true,
local appliedIndex not lagging beyond configured bound,
no recent clock anomaly signal,
no leader-transfer in progress.

Fallback chain:

LeaseRead -> ReadIndex -> (timeout/error policy)

Never fallback from strict endpoint to serializable silently.

5.3 Follower read endpoint (explicitly stale)

Expose separate API mode/header (example):

consistency=serializable

Return metadata:

followerAppliedIndex,
leaderCommitIndex (if known),
estimated staleness window.

Make staleness explicit to callers.

6) Lease safety guardrails (practical)

If using lease reads, define a conservative lease budget:

lease window must be well below election timeout,
subtract clock skew + pause + network jitter budget.

Example rule-of-thumb:

lease_max <= election_timeout - (2 * worst_rtt) - clock_skew_budget - pause_budget

If computed lease budget is small/negative, disable lease reads and stick to ReadIndex.

Additional hardening:

monotonic clock source checks,
alert on clock step events,
alert on long GC/stop-the-world or VM suspend,
disable lease fast path automatically on anomaly.

7) What to measure (must-have telemetry)

For read safety/perf, track at least:

raft_read_requests_total{mode=readindex|lease|serializable}
raft_read_latency_ms{mode,...} (p50/p95/p99)
raft_readindex_roundtrip_ms (quorum confirmation cost)
raft_read_apply_wait_ms (read index to applied index wait)
raft_applied_commit_gap (commit index - applied index)
raft_lease_reject_total{reason=expired|clock_anomaly|transfer|apply_lag}
raft_strict_read_timeout_total

If you cannot separate latency by read mode, you cannot tune safely.

8) Common failure modes

A) Hidden consistency downgrade

Symptom:

“strict” API occasionally returns stale values during stress.

Root cause:

automatic fallback strict -> serializable on timeout.

Fix:

remove silent downgrade; return retriable error or redirect to leader.

B) Lease reads during clock anomalies

Symptom:

rare stale reads around NTP corrections/VM pause.

Root cause:

lease validity checked without robust time anomaly guardrails.

Fix:

anomaly detector disables lease path; force ReadIndex.

C) Applied index lag not enforced

Symptom:

leader has safe read index but serves before state machine catches up.

Fix:

enforce appliedIndex >= readIndex gate for every strict read.

D) ReadIndex latency spikes under quorum turbulence

Symptom:

p99 strict read latency jumps during partial packet loss.

Fix:

heartbeat stability tuning, network QoS for Raft traffic, tighter timeout tiers, and caller retries with jitter.

9) Rollout plan (low-risk)

Start with ReadIndex-only strict reads.
Add full telemetry and strict timeout/error semantics.
Introduce lease reads for a small traffic slice.
Continuously compare correctness/perf vs ReadIndex baseline.
Auto-disable lease path on clock/leadership anomalies.
Keep serializable mode explicit and opt-in.

10) Practical takeaway

For most teams, ReadIndex is the best default strict-read path: linearizable, simpler than lease risk, and cheaper than read-as-write.

Use lease reads as an optimization layer, not as the foundation. When in doubt, pay a little latency to keep semantics honest.

References

Raft paper (Ongaro & Ousterhout): https://raft.github.io/raft.pdf
etcd-io/raft README (ReadOnlySafe / lease-based read features): https://github.com/etcd-io/raft
etcd API guarantees (linearizable vs serializable): https://etcd.io/docs/v3.5/learning/api_guarantees/