PostgreSQL MVCC + Autovacuum Bloat Control Playbook

Why this matters

PostgreSQL gets write performance from MVCC: updates/deletes create dead tuples, and readers keep going without blocking writers.

That convenience becomes a hidden tax if vacuum discipline is weak:

table/index bloat grows quietly,
cache hit quality degrades,
WAL/checkpoint pressure rises,
VACUUM debt eventually arrives as latency cliffs.

This playbook treats bloat control as an operations loop, not occasional cleanup.

Core model (what to optimize)

MVCC health is a 3-way balance:

Dead tuple creation rate (writes/updates/deletes)
Cleanup throughput (autovacuum + vacuum cost budget)
Tuple visibility constraints (long transactions, replicas, old snapshots)

If creation > cleanup for long enough, you enter debt mode.

Vacuum trigger math (per table)

Autovacuum vacuum trigger is roughly:

autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * reltuples

Analyze trigger is similar with analyze settings.

Global defaults are often too loose for very large or very hot tables.

Failure modes to watch

Scale-factor trap on huge tables
A high scale factor means millions of dead tuples before cleanup starts.
Long-running transactions pin visibility horizon
Vacuum runs, but cannot reclaim enough dead tuples.
Index bloat dominates table bloat
Heap looks okay, but indexes become fat and random I/O heavy.
Freeze debt accumulation
Waiting too long near XID age limits forces aggressive work at bad times.
Autovacuum worker starvation
Too few workers and too little I/O budget for write volume.

Observability: minimum dashboard

Track these as first-class signals:

n_dead_tup, n_live_tup by table (pg_stat_all_tables)
last_autovacuum, last_autoanalyze
transaction age and oldest active transaction
autovacuum throughput and duration
pg_stat_progress_vacuum for live activity
table/index size growth slope (not only absolute size)
query p95/p99 for hot relations

Practical derived metrics

Dead Tuple Ratio (DTR) = n_dead_tup / (n_live_tup + 1)
Vacuum Lag Age (VLA) = now - last_autovacuum
Bloat Velocity (BV) = d(size)/dt under stable data cardinality

You do not need perfect bloat estimation to detect danger early; trend and slope are enough.

State machine for operations

Use explicit states so actions are predictable:

GREEN

DTR low/stable
vacuum cadence steady
latency stable

Action:

Keep baseline settings.
Weekly review only.

AMBER

DTR rising for hot tables
VLA widening
index/table size slope accelerating

Action:

Lower per-table vacuum scale factor on offenders.
Increase autovacuum workers/cost budget.
Check long transactions.

RED

sustained DTR elevation + latency degradation
autovacuum cannot catch up
freeze age trending dangerously high

Action:

Targeted manual VACUUM (ANALYZE) windows.
Throttle noisy writers if possible.
Kill or fix runaway long transactions.
Prioritize hottest tables first.

SAFE

emergency containment mode

Action:

Protect availability first.
Defer non-critical write workloads.
Execute controlled remediation (including REINDEX CONCURRENTLY when index bloat dominates).

Tuning playbook (high leverage)

1) Per-table overrides before global aggression

For high-churn large tables, set tighter table-local knobs:

lower autovacuum_vacuum_scale_factor
lower autovacuum_analyze_scale_factor
optionally raise per-table autovacuum cost limit

This avoids over-vacuuming cold tables while protecting hot ones.

2) Keep long transactions on a leash

Operational policy:

alert on transaction age above threshold,
enforce timeout policies for app sessions,
treat orphaned idle-in-transaction sessions as incidents.

MVCC hygiene is impossible if horizons remain pinned.

3) Budget enough parallel cleanup capacity

Review:

autovacuum_max_workers
autovacuum_naptime
vacuum cost settings (vacuum_cost_limit, vacuum_cost_delay)

If write traffic rose but vacuum budget stayed static, debt is expected.

4) Separate index remediation from heap remediation

If query latency is index-random-I/O dominated:

use REINDEX CONCURRENTLY in controlled batches,
keep fallback plans (cancel windows, rollback sequencing),
verify post-remediation with both size and p95 latency deltas.

5) Freeze-risk governance

Don’t wait for anti-wraparound panic mode.

monitor relation age continuously,
schedule freeze-friendly maintenance windows,
treat rising freeze debt as capacity risk, not housekeeping.

Incident runbook (when latency is already burning)

Triage top offenders
- rank by hot query impact × relation size growth × DTR.
Remove blockers
- identify long transactions / pinned snapshots.
Stabilize write pressure
- rate-limit nonessential writers where possible.
Apply targeted cleanup
- manual vacuum/analyze on top offenders first.
Handle index bloat specifically
- schedule REINDEX CONCURRENTLY for worst index offenders.
Post-incident policy update
- move successful hotfix knobs into persistent per-table settings.

30-day rollout template

Week 1: Baseline

classify top 20 write-heavy tables,
capture DTR/VLA/BV baseline,
map long-transaction sources by service.

Week 2: Targeted tuning

apply per-table autovacuum overrides to top offenders,
raise cleanup capacity only as needed,
add transaction-age alerts.

Week 3: Chaos-proofing

run stress rehearsal with elevated write rate,
verify vacuum catch-up behavior,
test incident runbook timing.

Week 4: Governance

lock SLOs for MVCC health,
codify thresholds/states/actions,
schedule periodic bloat and freeze debt reviews.

Anti-patterns

one-size-fits-all global scale factor
waiting for visible disk pain before acting
treating long transactions as app-team-only issue
running heavy blocking maintenance without rollback plan
celebrating one-off cleanup without feedback-loop tuning

Bottom line

Autovacuum is not a background detail. It is a production control loop.

When tuple-debt velocity, cleanup capacity, and visibility blockers are managed together, PostgreSQL stays boring. When they are not, bloat converts quietly into p95/p99 latency tax and surprise maintenance risk.