CRDT vs OT for Realtime Collaboration (Practical Playbook)

2026-02-26 · software

CRDT vs OT for Realtime Collaboration (Practical Playbook)

Date: 2026-02-26
Category: knowledge
Domain: distributed systems / collaborative editing

Why this matters

If you want Google-Docs-like collaboration, the hard part is not WebSocket plumbing.
The hard part is convergence under concurrent edits while keeping latency low.

Most teams eventually choose one of two families:

Both can work in production. Most failures come from choosing a model that mismatches your product shape.


Executive summary (fast choice)

Choose OT when:

Choose CRDT when:

If undecided: start with centralized CRDT sync (client-server) and avoid P2P until product need is real.


Mental model

OT in one sentence

OT rewrites incoming operations against already-applied concurrent operations so every replica applies a transformed op sequence and converges.

CRDT in one sentence

CRDT designs the data type/operations so merges are mathematically convergent without needing a global transformation pipeline.


Tradeoff table

Dimension OT CRDT
Core mechanism Transform ops by context/version Merge commutative/idempotent ops/states
Offline support Possible, but server/version coupling can be tricky Natural fit, especially op-based CRDTs
Central server dependency Usually strong Optional (still common in practice)
Text performance Very good with mature implementations Good, but metadata/GC design is crucial
Non-text shared data More custom logic needed Strong with map/list/set CRDT families
Implementation complexity High in transform correctness High in metadata, tombstones, compaction
Debuggability Version + transform bugs are painful Metadata growth + causal ordering bugs
Typical wire size Often smaller Can grow unless aggressively compacted

Architecture patterns that actually work

1) Central relay (recommended default)

This works for both OT and CRDT and is usually enough for 95% of products.

2) Offline-first with reconnect reconciliation

CRDT usually simplifies this path.

3) Multi-region collaboration

CRDT is often easier to reason about at region boundaries; OT can still work with disciplined sequencer design.


Data model choices (most teams underestimate this)

Text only editor

Rich document (text + comments + blocks + embeds)

Whiteboard/graph app


Performance guardrails

  1. Snapshot + incremental log

    • Periodically persist snapshots.
    • Replay only tail ops at load.
  2. Compaction policy

    • Define compaction trigger by op count or byte size.
    • Keep compaction deterministic and versioned.
  3. Tombstone/metadata control (CRDT)

    • Plan garbage collection from day one.
    • Never assume tombstones stay “small enough.”
  4. Transform test corpus (OT)

    • Maintain randomized concurrent-edit fuzz suite.
    • Regression-test classic edge cases: insert/insert same index, delete-overlap, replace chains.
  5. Bounded payloads

    • Limit max ops per message and max catch-up batch.
    • Throttle reconnect storms with server tokens.

Consistency model UX checklist

Users forgive tiny delays. They do not forgive data loss.


Testing strategy (minimum viable seriousness)

Deterministic simulation

Jepsen-lite chaos for collaboration backend

Property tests


Observability that prevents 3 a.m. incidents

Track these as first-class metrics:

Alert on trend, not just static threshold.


Migration patterns

OT → CRDT

CRDT → OT (rare but possible)


Common failure modes

  1. Treating presence events as durable document events.
  2. Shipping without compaction/GC strategy.
  3. Assuming “eventual consistency” means “no UX design needed.”
  4. Ignoring versioned schemas for operation payloads.
  5. No anti-entropy protocol for missed ops.

Practical stack suggestions

Regardless of stack:


30-minute decision rubric

Score 1~5 for each axis:

Interpretation:

Pick one, then invest in testing + observability before adding fancy features.

That discipline matters more than framework choice.