Lease-Based Distributed Locks Without Illusions: Fencing Token Playbook

Date: 2026-02-24
Category: knowledge
Domain: distributed systems / correctness engineering

Why this matters

Many teams use "distributed lock = mutual exclusion" as if it were a local mutex.
In production, that assumption fails under exactly the conditions you care about:

process pauses (GC, CPU starvation, SIGSTOP),
network delays/reordering,
lease expiry races,
delayed writes arriving after ownership changed.

A lock lease can expire while the old holder is still alive, and the old holder may still write stale state later.
If your system has correctness requirements (not just efficiency), a lock alone is not enough.

Core principle

For correctness-critical work:

Use lock/lease only to reduce overlap probability.
Attach a monotonic fencing token to each critical operation.
Enforce token monotonicity at the resource being protected.

If the protected resource cannot reject stale tokens, your safety story is incomplete.

First decision: efficiency lock or correctness lock?

Borrow this practical split:

Efficiency lock: duplicate work is acceptable (extra cost, occasional duplicate email/job).
Correctness lock: overlap causes wrong state, data loss, or irreversible side-effects.

If this is efficiency-only, a simple lease is often enough.
If this is correctness-critical, move to fencing-token design immediately.

Failure mode in one timeline

Client A acquires lease + lock.
Client A pauses for longer than lease TTL.
Client B acquires new lease + lock and performs update.
Client A resumes and writes stale update.

Without downstream token checks, both writes may succeed in bad order.

What a fencing token is

A fencing token is a strictly increasing value issued on lock acquisition (or ownership epoch change).

newer owner => larger token
protected resource tracks max accepted token
any request with token <= max is rejected

This converts "who thinks they hold lock" into "what writes are admissible".

Safety invariant

For each protected resource R:

Let Tmax(R) be highest token already accepted.
Accept operation (R, t) iff t > Tmax(R).
On accept: set Tmax(R) = t atomically with the write.

Anything weaker (e.g., checking token outside write transaction) can still race.

Implementation patterns (practical)

Pattern A) SQL row/resource with token column

Add columns:

lock_epoch BIGINT NOT NULL
business fields...

Write with compare condition:

UPDATE account_positions
SET qty = ?, avg_px = ?, lock_epoch = ?
WHERE account_id = ?
  AND lock_epoch < ?;

Interpretation:

affected rows = 1 -> accepted
affected rows = 0 -> stale token rejected (or missing row)

Use one transaction boundary for state + epoch.

Pattern B) KV store / document store CAS

Store {value, maxToken} together and update via CAS/precondition:

read current version (or etag/revision)
write only if both:
- version unchanged, and
- incoming token > maxToken

If DB supports conditional updates directly, encode both conditions server-side.

Pattern C) Object storage metadata gate

Object stores often support conditional writes (e.g., If-Match on ETag).
Use object metadata/state object carrying max_token, then update with conditional request.

Key point: token comparison must be part of authoritative update path, not client-side only.

Pattern D) External side-effects (APIs, queues)

When protected action is "call external system":

include token/idempotency key in request,
require receiver to reject lower/duplicate epochs,
log (resource, token, decision) for audit.

If receiver cannot enforce monotonic token, classify as best-effort only.

Where tokens come from

Good token sources are globally ordered and monotonic per lock domain:

consensus-store revision/sequence numbers,
lock-node sequence in ZooKeeper-style recipes,
dedicated monotonic counter in strongly consistent store.

Avoid:

wall-clock timestamps,
random UUIDs (unique but not ordered),
per-node counters without consensus.

Token scope design

Define scope explicitly:

per-resource token stream (safer isolation, more metadata)
per-lock-name token stream (simpler, may be coarse)

Rule of thumb: scope by actual contention boundary (what can conflict in one correctness domain).

TTL and heartbeat tuning (what actually matters)

TTL tuning does not create correctness by itself; it changes overlap probability.

Still useful:

TTL >= p99.9 critical section + jitter margin
heartbeat period <= TTL/3
treat missed heartbeats as signal to stop risky writes proactively

But even perfect tuning cannot rule out long pauses/delays.
Fencing remains the correctness backstop.

Observability checklist

Track these counters/time series:

lock_acquire_success_total
lock_acquire_latency_ms
fencing_reject_total (stale token rejects)
token_gap (incoming token - current max)
lease_expired_while_executing_total
critical_write_without_token_total (should be zero)

Alert examples:

sudden rise in fencing rejects + latency spike (pause/network event)
any nonzero write-without-token in correctness path
repeated lease-expiry-during-work in same service shard

Rollout plan (safe migration)

Instrument first: add token fields and passive logging.
Shadow enforce: compute accept/reject but do not block yet.
Canary enforce: reject stale tokens for small subset.
Full enforce: block globally; keep override only for emergency.
Remove unsafe paths: forbid writes missing token.

Success criteria:

stale writes blocked in drills/chaos tests,
no correctness incidents from overlapping lease holders,
zero bypasses in normal operation.

Common footguns

Lock acquired, token ignored downstream.
Token checked in app code but not atomically with write.
Using timestamp as token (clock skew/regression).
Assuming Redlock/lease semantics alone guarantee exclusion.
Calling external system that cannot reject stale epochs.

If any of these are true, document system as "best-effort lock", not strict exclusion.

Quick decision cheat sheet

Need only duplicate-work suppression? -> simple lease lock is okay.
Need correctness under pauses/partitions? -> lock + fencing + enforced reject path.
Resource cannot enforce monotonic token? -> redesign boundary or downgrade guarantee claim.

The practical stance: leases coordinate intent; fencing protects truth.

References (researched)

Martin Kleppmann, How to do distributed locking (fencing token rationale, pause/delay failure mode)
https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
Apache ZooKeeper, Recipes and Solutions (ephemeral + sequential lock patterns)
https://zookeeper.apache.org/doc/r3.4.4/recipes.html
etcd issue: Document that locks aren't really locks (naive lock caveat; fencing token guidance)
https://github.com/etcd-io/etcd/issues/11457
Jepsen analysis: etcd 3.4.3 (lock caveats under faults; KV correctness context)
https://jepsen.io/analyses/etcd-3.4.3
AWS S3 docs: Conditional writes (If-Match / If-None-Match for guarded updates)
https://docs.aws.amazon.com/AmazonS3/latest/userguide/conditional-writes.html
Burrows, The Chubby lock service for loosely-coupled distributed systems (lock service design context)
https://research.google/pubs/the-chubby-lock-service-for-loosely-coupled-distributed-systems/