CRDT Tombstone Garbage Collection & Compaction: Production Playbook
CRDTs converge beautifully, but production systems pay for that convergence in history growth.
Deletes become tombstones, old ops stay around for causality, and sync logs keep expanding unless you run a disciplined GC + compaction loop.
This playbook is a practical guide for keeping CRDT replicas correct and bounded.
One-Line Intuition
You may physically delete tombstoned data only after proving every replica has causally passed the delete point.
1) The Core Problem
Most CRDT incidents around storage cost have the same root:
- Logical delete != physical delete
- "Eventually synced" is not a GC condition
- Offline clients make naive cleanup unsafe
If you remove tombstones too early, a late replica can resurrect deleted state or fail to apply downstream ops that reference removed nodes.
2) Safety Invariant (What Must Be True Before GC)
For an element removed at causal dot (actor=a, counter=c), physical GC is safe only when:
[ \text{stability}[a] \ge c ]
where stability is a global lower-bound frontier (often min-version-vector across active replicas/sessions for that document).
Practical meaning:
- every relevant replica has observed at least counter
cfrom actora - no future merge will require that tombstone to suppress stale adds/references
If you cannot compute this frontier reliably, do not hard-delete.
3) Architecture Pattern That Works in Production
A. Keep two clocks/planes
- Causal plane: version vectors / dots / heads for correctness
- Storage plane: incremental ops + periodic snapshots for size
B. Keep two delete phases
- Logical delete (tombstone now)
- Physical purge (after stability proof)
C. Keep two retention policies
- Convergence retention (minimum needed for merge correctness)
- Product retention (undo/history/legal/audit)
Do not mix them. Product history should be explicit snapshots/events, not accidental CRDT garbage.
4) Data Structures (Minimal)
For each document:
vv_client[i]: latest version vector per active client/sessionminVV: component-wise minimum over tracked vectorstombstones: entries withremovedAt = (actor, counter)oplog: incremental changessnapshots: compacted states identified by heads/hash
Component-wise minimum:
[ \text{minVV}[k] = \min_i \text{vv}_i[k] ]
Missing actor component should be treated conservatively (usually as 0 unless membership/lease logic says replica is retired).
5) GC Algorithm (Reference)
onSync(clientId, clientVV):
vv_client[clientId] = clientVV
minVV = componentwise_min(vv_client over active clients)
for each tombstone t in doc:
a = t.removedAt.actor
c = t.removedAt.counter
if minVV[a] >= c:
purge_physically(t)
Key production detail: "active clients" must be defined by membership/lease policy, not wishful thinking.
6) Membership, Offline Clients, and the Real Trap
minVV can stall forever if zombie sessions are never retired.
Use explicit policy:
- session lease/heartbeat TTL
- device retirement flow
- document participant compaction (who still counts for safety)
- server-side persisted frontier for reconnect continuity
A safe compromise:
- hot frontier for online participants
- cold retention window for long-offline devices
- after window expiry, require re-bootstrap from snapshot instead of indefinite tombstone retention
7) Sequence/Text CRDT Notes
Text CRDTs are where GC pain becomes visible first.
Typical practical choices:
- keep tombstones while references/causal links may still target them
- compact internal structs into snapshot forms periodically
- separate "collab correctness" from "version snapshot feature" (if product needs full historical restore, expect larger storage)
Operationally, teams often run GC enabled by default and route "full history" documents to a separate policy tier.
8) Compaction Strategy (Incremental + Snapshot)
A robust loop:
- append incremental ops continuously
- when
(incremental_bytes > k * snapshot_bytes)orop_count > threshold, build snapshot - persist snapshot keyed by content identity (e.g., doc id + heads/hash)
- delete only incremental chunks proven included in snapshot
This avoids write-write races across concurrent compactors and works with simple key-value backends.
9) Observability: Metrics That Predict Pain Early
Track at document and fleet level:
tombstone_count,tombstone_bytesgc_eligible_count,gc_purged_countfrontier_lag = now_dot - minVVoldest_unstable_tombstone_agesnapshot_interval_sec,snapshot_bytes,incremental_bytesgc_pause_ms/ compaction latency impact
Alert patterns:
- frontier lag monotonic growth
- high eligible-but-not-purged gap
- compaction frequency spike with no net size reduction
10) Failure Modes
- Lamport-only cleanup without causal frontier -> premature purge risk
- Never-expiring client set -> GC never triggers
- Undo feature piggybacking on tombstones -> unbounded growth by design
- Compaction race deleting unseen increments -> silent divergence bugs
- No cold-start plan for long-offline devices -> permanent retention tax
11) Rollout Ladder
Stage 0 — Observe only
- compute candidate
minVV - mark GC-eligible tombstones but do not purge
- compare simulated reclaim vs expected safety checks
Stage 1 — Conservative purge
- purge only old tombstones beyond long age threshold
- exclude high-value docs first
Stage 2 — Full policy with canaries
- enable per-tenant/doc-class
- add auto-disable on anomaly (merge failures, replay inconsistencies)
Stage 3 — Cost-optimized steady state
- adaptive compaction threshold by document churn profile
- archival tier for product-history-heavy docs
12) Practical Decision Matrix
- Need rich collaborative editing + moderate history -> GC on, periodic snapshots
- Need legal/audit immutable history -> append-only audit log separate from CRDT merge state
- Need "time travel" UX for a subset -> per-doc policy tier, not global GC-off
Minimal Implementation Checklist
- Define causal metadata contract (vv/dot/heads)
- Implement stable frontier computation with membership TTL
- Add two-phase delete (logical first, physical later)
- Add safe compaction (incremental -> snapshot)
- Ship metrics + drift alerts
- Canary enablement and rollback switch
One-Sentence Summary
CRDT storage stays healthy when tombstone purge is gated by a causal stability frontier and paired with race-safe snapshot compaction—anything weaker eventually pays either correctness risk or runaway storage cost.
References
- Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M. (2011). A comprehensive study of Convergent and Commutative Replicated Data Types. https://inria.hal.science/inria-00555588
- Kleppmann, M., Wiggins, A., van Hardenberg, P., McGranaghan, M. (2019). Local-first software: You own your data, in spite of the cloud. https://www.inkandswitch.com/essay/local-first/
- Yjs Docs —
Y.Doc(doc.gcbehavior). https://docs.yjs.dev/api/y.doc - Yorkie design notes — version-vector-based garbage collection. https://github.com/yorkie-team/yorkie/blob/main/design/garbage-collection.md
- Automerge storage internals — incremental changes + snapshot compaction model. https://automerge.org/docs/reference/under-the-hood/storage/