CRDT Tombstone Garbage Collection & Compaction: Production Playbook

CRDTs converge beautifully, but production systems pay for that convergence in history growth.

Deletes become tombstones, old ops stay around for causality, and sync logs keep expanding unless you run a disciplined GC + compaction loop.

This playbook is a practical guide for keeping CRDT replicas correct and bounded.

One-Line Intuition

You may physically delete tombstoned data only after proving every replica has causally passed the delete point.

1) The Core Problem

Most CRDT incidents around storage cost have the same root:

Logical delete != physical delete
"Eventually synced" is not a GC condition
Offline clients make naive cleanup unsafe

If you remove tombstones too early, a late replica can resurrect deleted state or fail to apply downstream ops that reference removed nodes.

2) Safety Invariant (What Must Be True Before GC)

For an element removed at causal dot (actor=a, counter=c), physical GC is safe only when:

[ \text{stability}[a] \ge c ]

where stability is a global lower-bound frontier (often min-version-vector across active replicas/sessions for that document).

Practical meaning:

every relevant replica has observed at least counter c from actor a
no future merge will require that tombstone to suppress stale adds/references

If you cannot compute this frontier reliably, do not hard-delete.

3) Architecture Pattern That Works in Production

A. Keep two clocks/planes

Causal plane: version vectors / dots / heads for correctness
Storage plane: incremental ops + periodic snapshots for size

B. Keep two delete phases

Logical delete (tombstone now)
Physical purge (after stability proof)

C. Keep two retention policies

Convergence retention (minimum needed for merge correctness)
Product retention (undo/history/legal/audit)

Do not mix them. Product history should be explicit snapshots/events, not accidental CRDT garbage.

4) Data Structures (Minimal)

For each document:

vv_client[i]: latest version vector per active client/session
minVV: component-wise minimum over tracked vectors
tombstones: entries with removedAt = (actor, counter)
oplog: incremental changes
snapshots: compacted states identified by heads/hash

Component-wise minimum:

[ \text{minVV}[k] = \min_i \text{vv}_i[k] ]

Missing actor component should be treated conservatively (usually as 0 unless membership/lease logic says replica is retired).

5) GC Algorithm (Reference)

onSync(clientId, clientVV):
  vv_client[clientId] = clientVV
  minVV = componentwise_min(vv_client over active clients)

for each tombstone t in doc:
  a = t.removedAt.actor
  c = t.removedAt.counter
  if minVV[a] >= c:
    purge_physically(t)

Key production detail: "active clients" must be defined by membership/lease policy, not wishful thinking.

6) Membership, Offline Clients, and the Real Trap

minVV can stall forever if zombie sessions are never retired.

Use explicit policy:

session lease/heartbeat TTL
device retirement flow
document participant compaction (who still counts for safety)
server-side persisted frontier for reconnect continuity

A safe compromise:

hot frontier for online participants
cold retention window for long-offline devices
after window expiry, require re-bootstrap from snapshot instead of indefinite tombstone retention

7) Sequence/Text CRDT Notes

Text CRDTs are where GC pain becomes visible first.

Typical practical choices:

keep tombstones while references/causal links may still target them
compact internal structs into snapshot forms periodically
separate "collab correctness" from "version snapshot feature" (if product needs full historical restore, expect larger storage)

Operationally, teams often run GC enabled by default and route "full history" documents to a separate policy tier.

8) Compaction Strategy (Incremental + Snapshot)

A robust loop:

append incremental ops continuously
when (incremental_bytes > k * snapshot_bytes) or op_count > threshold, build snapshot
persist snapshot keyed by content identity (e.g., doc id + heads/hash)
delete only incremental chunks proven included in snapshot

This avoids write-write races across concurrent compactors and works with simple key-value backends.

9) Observability: Metrics That Predict Pain Early

Track at document and fleet level:

tombstone_count, tombstone_bytes
gc_eligible_count, gc_purged_count
frontier_lag = now_dot - minVV
oldest_unstable_tombstone_age
snapshot_interval_sec, snapshot_bytes, incremental_bytes
gc_pause_ms / compaction latency impact

Alert patterns:

frontier lag monotonic growth
high eligible-but-not-purged gap
compaction frequency spike with no net size reduction

10) Failure Modes

Lamport-only cleanup without causal frontier -> premature purge risk
Never-expiring client set -> GC never triggers
Undo feature piggybacking on tombstones -> unbounded growth by design
Compaction race deleting unseen increments -> silent divergence bugs
No cold-start plan for long-offline devices -> permanent retention tax

11) Rollout Ladder

Stage 0 — Observe only

compute candidate minVV
mark GC-eligible tombstones but do not purge
compare simulated reclaim vs expected safety checks

Stage 1 — Conservative purge

purge only old tombstones beyond long age threshold
exclude high-value docs first

Stage 2 — Full policy with canaries

enable per-tenant/doc-class
add auto-disable on anomaly (merge failures, replay inconsistencies)

Stage 3 — Cost-optimized steady state

adaptive compaction threshold by document churn profile
archival tier for product-history-heavy docs

12) Practical Decision Matrix

Need rich collaborative editing + moderate history -> GC on, periodic snapshots
Need legal/audit immutable history -> append-only audit log separate from CRDT merge state
Need "time travel" UX for a subset -> per-doc policy tier, not global GC-off

Minimal Implementation Checklist

Define causal metadata contract (vv/dot/heads)
Implement stable frontier computation with membership TTL
Add two-phase delete (logical first, physical later)
Add safe compaction (incremental -> snapshot)
Ship metrics + drift alerts
Canary enablement and rollback switch

One-Sentence Summary

CRDT storage stays healthy when tombstone purge is gated by a causal stability frontier and paired with race-safe snapshot compaction—anything weaker eventually pays either correctness risk or runaway storage cost.

References

Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M. (2011). A comprehensive study of Convergent and Commutative Replicated Data Types. https://inria.hal.science/inria-00555588
Kleppmann, M., Wiggins, A., van Hardenberg, P., McGranaghan, M. (2019). Local-first software: You own your data, in spite of the cloud. https://www.inkandswitch.com/essay/local-first/
Yjs Docs — Y.Doc (doc.gc behavior). https://docs.yjs.dev/api/y.doc
Yorkie design notes — version-vector-based garbage collection. https://github.com/yorkie-team/yorkie/blob/main/design/garbage-collection.md
Automerge storage internals — incremental changes + snapshot compaction model. https://automerge.org/docs/reference/under-the-hood/storage/