Lease-Based Distributed Locks Without Illusions: Fencing Token Playbook
Date: 2026-02-24
Category: knowledge
Domain: distributed systems / correctness engineering
Why this matters
Many teams use "distributed lock = mutual exclusion" as if it were a local mutex.
In production, that assumption fails under exactly the conditions you care about:
- process pauses (GC, CPU starvation, SIGSTOP),
- network delays/reordering,
- lease expiry races,
- delayed writes arriving after ownership changed.
A lock lease can expire while the old holder is still alive, and the old holder may still write stale state later.
If your system has correctness requirements (not just efficiency), a lock alone is not enough.
Core principle
For correctness-critical work:
- Use lock/lease only to reduce overlap probability.
- Attach a monotonic fencing token to each critical operation.
- Enforce token monotonicity at the resource being protected.
If the protected resource cannot reject stale tokens, your safety story is incomplete.
First decision: efficiency lock or correctness lock?
Borrow this practical split:
- Efficiency lock: duplicate work is acceptable (extra cost, occasional duplicate email/job).
- Correctness lock: overlap causes wrong state, data loss, or irreversible side-effects.
If this is efficiency-only, a simple lease is often enough.
If this is correctness-critical, move to fencing-token design immediately.
Failure mode in one timeline
- Client A acquires lease + lock.
- Client A pauses for longer than lease TTL.
- Client B acquires new lease + lock and performs update.
- Client A resumes and writes stale update.
Without downstream token checks, both writes may succeed in bad order.
What a fencing token is
A fencing token is a strictly increasing value issued on lock acquisition (or ownership epoch change).
- newer owner => larger token
- protected resource tracks max accepted token
- any request with token <= max is rejected
This converts "who thinks they hold lock" into "what writes are admissible".
Safety invariant
For each protected resource R:
- Let
Tmax(R)be highest token already accepted. - Accept operation
(R, t)ifft > Tmax(R). - On accept: set
Tmax(R) = tatomically with the write.
Anything weaker (e.g., checking token outside write transaction) can still race.
Implementation patterns (practical)
Pattern A) SQL row/resource with token column
Add columns:
lock_epoch BIGINT NOT NULL- business fields...
Write with compare condition:
UPDATE account_positions
SET qty = ?, avg_px = ?, lock_epoch = ?
WHERE account_id = ?
AND lock_epoch < ?;
Interpretation:
- affected rows = 1 -> accepted
- affected rows = 0 -> stale token rejected (or missing row)
Use one transaction boundary for state + epoch.
Pattern B) KV store / document store CAS
Store {value, maxToken} together and update via CAS/precondition:
- read current version (or etag/revision)
- write only if both:
- version unchanged, and
- incoming token > maxToken
If DB supports conditional updates directly, encode both conditions server-side.
Pattern C) Object storage metadata gate
Object stores often support conditional writes (e.g., If-Match on ETag).
Use object metadata/state object carrying max_token, then update with conditional request.
Key point: token comparison must be part of authoritative update path, not client-side only.
Pattern D) External side-effects (APIs, queues)
When protected action is "call external system":
- include token/idempotency key in request,
- require receiver to reject lower/duplicate epochs,
- log
(resource, token, decision)for audit.
If receiver cannot enforce monotonic token, classify as best-effort only.
Where tokens come from
Good token sources are globally ordered and monotonic per lock domain:
- consensus-store revision/sequence numbers,
- lock-node sequence in ZooKeeper-style recipes,
- dedicated monotonic counter in strongly consistent store.
Avoid:
- wall-clock timestamps,
- random UUIDs (unique but not ordered),
- per-node counters without consensus.
Token scope design
Define scope explicitly:
- per-resource token stream (safer isolation, more metadata)
- per-lock-name token stream (simpler, may be coarse)
Rule of thumb: scope by actual contention boundary (what can conflict in one correctness domain).
TTL and heartbeat tuning (what actually matters)
TTL tuning does not create correctness by itself; it changes overlap probability.
Still useful:
- TTL >= p99.9 critical section + jitter margin
- heartbeat period <= TTL/3
- treat missed heartbeats as signal to stop risky writes proactively
But even perfect tuning cannot rule out long pauses/delays.
Fencing remains the correctness backstop.
Observability checklist
Track these counters/time series:
lock_acquire_success_totallock_acquire_latency_msfencing_reject_total(stale token rejects)token_gap(incoming token - current max)lease_expired_while_executing_totalcritical_write_without_token_total(should be zero)
Alert examples:
- sudden rise in fencing rejects + latency spike (pause/network event)
- any nonzero write-without-token in correctness path
- repeated lease-expiry-during-work in same service shard
Rollout plan (safe migration)
- Instrument first: add token fields and passive logging.
- Shadow enforce: compute accept/reject but do not block yet.
- Canary enforce: reject stale tokens for small subset.
- Full enforce: block globally; keep override only for emergency.
- Remove unsafe paths: forbid writes missing token.
Success criteria:
- stale writes blocked in drills/chaos tests,
- no correctness incidents from overlapping lease holders,
- zero bypasses in normal operation.
Common footguns
- Lock acquired, token ignored downstream.
- Token checked in app code but not atomically with write.
- Using timestamp as token (clock skew/regression).
- Assuming Redlock/lease semantics alone guarantee exclusion.
- Calling external system that cannot reject stale epochs.
If any of these are true, document system as "best-effort lock", not strict exclusion.
Quick decision cheat sheet
- Need only duplicate-work suppression? -> simple lease lock is okay.
- Need correctness under pauses/partitions? -> lock + fencing + enforced reject path.
- Resource cannot enforce monotonic token? -> redesign boundary or downgrade guarantee claim.
The practical stance: leases coordinate intent; fencing protects truth.
References (researched)
- Martin Kleppmann, How to do distributed locking (fencing token rationale, pause/delay failure mode)
https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html - Apache ZooKeeper, Recipes and Solutions (ephemeral + sequential lock patterns)
https://zookeeper.apache.org/doc/r3.4.4/recipes.html - etcd issue: Document that locks aren't really locks (naive lock caveat; fencing token guidance)
https://github.com/etcd-io/etcd/issues/11457 - Jepsen analysis: etcd 3.4.3 (lock caveats under faults; KV correctness context)
https://jepsen.io/analyses/etcd-3.4.3 - AWS S3 docs: Conditional writes (
If-Match/If-None-Matchfor guarded updates)
https://docs.aws.amazon.com/AmazonS3/latest/userguide/conditional-writes.html - Burrows, The Chubby lock service for loosely-coupled distributed systems (lock service design context)
https://research.google/pubs/the-chubby-lock-service-for-loosely-coupled-distributed-systems/