RocksDB Compaction Style & Write-Stall Control Playbook

Date: 2026-03-30
Category: knowledge
Audience: Operators running write-heavy or mixed read/write RocksDB workloads in production.

Why this matters

RocksDB performance incidents are rarely “random.” They usually come from one of three amplification failures:

When flush+compaction lag, stalls begin: first slowdown, then hard stop.

Use when:

Tradeoff:

Operational note:

Use when:

Tradeoff:

Lower write amp, but read/space amplification and performance variance usually increase.

Operational note:

Plan extra free disk headroom: full compactions can need near-double temporary space.

Use when:

Tradeoff:

Stalls are per-column-family triggers but throttle the whole DB process.

Primary triggers:

Too many immutable memtables (max_write_buffer_number pressure)
Too many L0 files (level0_slowdown_writes_trigger, level0_stop_writes_trigger)
Too many pending compaction bytes (soft/hard_pending_compaction_bytes_limit)

When triggered, RocksDB delays writes (delayed_write_rate), then can hard-stop writers.

If some writes must never block, use WriteOptions.no_slowdown=true and handle Status::Incomplete() in application logic.

Measure first
- rocksdb.stats, compaction stats, perf context, host disk/CPU metrics.
- Confirm whether bottleneck is write BW, read IOPS, CPU, or cache pressure.
Increase compaction/flush concurrency
- Prefer max_background_jobs sizing from actual host capacity.
- Keep flush and compaction pools balanced; avoid compaction starvation.
Reduce avoidable write amp
- Revisit write_buffer_size, min_write_buffer_number_to_merge, level sizing.
- Only then consider switching compaction style.
Control metadata-cache thrash (large DB:RAM ratios)
- Enable partitioned indexes/filters (kTwoLevelIndexSearch, partition_filters).
- Use cache policies that prioritize/pin top-level metadata where supported.
Only then move stall thresholds
- Raising stop/slowdown limits without fixing compaction debt just delays failure.

Symptom: write p99/p999 explodes or writers block.
Check LOG for: memtable stalls, L0 file stalls, pending compaction byte stalls.
Check compaction backlog + disk write BW saturation.
If L0 runaway: prioritize L0→L1 catch-up, reduce incoming write burst.
If pending bytes runaway: increase compaction capacity or reduce write amp path.
If cache-thrash driven read collapse: verify index/filter memory policy + partitions.
Capture before/after stats snapshot; never “tune blind” during incident.