PostgreSQL 17 Logical Replication Failover Slots HA Playbook
Date: 2026-03-23
Category: knowledge
Scope: How to make PostgreSQL logical subscribers survive primary failover without re-snapshotting.
1) Why this matters
Before PostgreSQL 17 failover-slot workflows, a primary promotion often meant painful subscriber surgery:
- re-create logical slots,
- risk data gaps or duplicates,
- sometimes re-bootstrap large tables.
PostgreSQL 17 formalizes logical replication failover so subscriber continuity can survive primary failover if you wire the slot-sync pipeline correctly.
2) Core mental model
A subscription can continue after failover only if:
- Its logical slot is marked
failover = trueon the publisher side. - Slot state was synchronized to the standby in time.
- The standby has a usable synced slot at promotion time.
- Subscriber
conninfois switched to the new primary.
Think of it as two lanes:
- Data lane: WAL shipping (physical replication)
- Control lane: logical slot state sync
Both must be healthy.
3) Required wiring (minimum viable HA)
3.1 Publisher / primary
- Create subscription with
failover = true(or create logical slot with failover enabled). - Configure
synchronized_standby_slotsto include the physical slot(s) of candidate failover standby(ies) so logical decoding does not outrun standby durability.
3.2 Physical standby (future primary)
On standby, configure:
sync_replication_slots = truehot_standby_feedback = onprimary_slot_name(physical slot to upstream primary)primary_conninfowith a validdbname(required for slot sync worker path)
Without this set, failover slots won’t synchronize reliably.
4) Subscription creation patterns
4.1 Preferred: explicit failover-enabled subscription
CREATE SUBSCRIPTION sub_orders
CONNECTION 'host=primary-db dbname=app user=repl password=***'
PUBLICATION pub_orders
WITH (
create_slot = true,
slot_name = 'sub_orders',
copy_data = false,
failover = true
);
4.2 Deferred/manual slot mode (advanced)
If create_slot = false, ensure slot-side failover property matches subscription-side failover semantics. Mismatches create confusing behavior (subscription says failover-enabled, slot doesn’t — or vice versa).
5) Pre-failover readiness checklist (must-pass)
5.1 On subscriber: list main slots tied to failover-enabled subscriptions
SELECT array_agg(quote_literal(s.subslotname)) AS slots
FROM pg_subscription s
WHERE s.subfailover
AND s.subslotname IS NOT NULL;
5.2 On subscriber: list relevant table-sync slots (finished copy only)
SELECT array_agg(quote_literal(slot_name)) AS slots
FROM (
SELECT CONCAT('pg_', srsubid, '_sync_', srrelid, '_', ctl.system_identifier) AS slot_name
FROM pg_control_system() ctl,
pg_subscription_rel r,
pg_subscription s
WHERE r.srsubstate = 'f'
AND s.oid = r.srsubid
AND s.subfailover
) t;
5.3 On target standby: confirm slots are failover-ready
SELECT slot_name,
(synced AND NOT temporary AND invalidation_reason IS NULL) AS failover_ready
FROM pg_replication_slots
WHERE slot_name IN ('sub1','sub2','sub3');
Only promote when all critical slots show failover_ready = true.
6) Failover runbook (planned event)
- Freeze subscriber apply directionally
ALTER SUBSCRIPTION ... DISABLEon subscribers (recommended before promotion).
- Promote standby to primary.
- Update subscriber connection strings:
ALTER SUBSCRIPTION ... CONNECTION 'host=new-primary ...';
- Re-enable subscriptions:
ALTER SUBSCRIPTION ... ENABLE;
- Validate no gap/regression in
confirmed_flush_lsnand app-level monotonic checks.
Why disable first? If old primary is still reachable, subscribers may keep consuming from old primary until conninfo flips, risking divergence.
7) Operational observability queries
7.1 Slot health on candidate primary/standby
SELECT slot_name,
slot_type,
failover,
synced,
active,
wal_status,
restart_lsn,
confirmed_flush_lsn,
invalidation_reason
FROM pg_replication_slots
ORDER BY slot_name;
7.2 Subscription posture on subscriber
SELECT subname,
subenabled,
subfailover,
subslotname,
subtwophasestate,
subsynccommit
FROM pg_subscription
ORDER BY subname;
7.3 Table sync status that can influence slot expectations
SELECT s.subname,
r.srrelid::regclass AS relation,
r.srsubstate,
r.srsublsn
FROM pg_subscription_rel r
JOIN pg_subscription s ON s.oid = r.srsubid
ORDER BY s.subname, relation;
8) Common failure modes
failover=trueforgotten at creation
Subscriber appears healthy until first failover drill.Standby missing
sync_replication_slots/primary_slot_name/hot_standby_feedback
Slot sync silently incomplete; promotion breaks logical continuity.No
synchronized_standby_slotson primary
Logical consumer can run ahead of standby durability; failover-ready window becomes fragile.Promoting with non-persistent synced slots
synced=truealone is insufficient if slot is temporary or invalidated.Skipping pre-promotion slot readiness SQL
You discover missing slot state only after cutover.
9) Practical SLO guardrails
- Require green readiness query before promotion approvals.
- Alert on
invalidation_reason IS NOT NULLfor logical slots. - Track slot lag budgets (
pg_current_wal_lsn()vs slot replay/confirm positions). - Treat failover drills as recurring game day, not one-time setup.
10) Bottom line
PostgreSQL 17 makes logical failover far more operationally sane, but only if you treat slot synchronization as a first-class HA dependency.
If you run logical replication in production, add failover-slot readiness checks to your promotion gate the same way you gate on replication lag and application health.
References
- PostgreSQL 17 docs — Logical Replication Failover: https://www.postgresql.org/docs/17/logical-replication-failover.html
- PostgreSQL 17 docs — Logical Decoding Concepts (slot synchronization): https://www.postgresql.org/docs/17/logicaldecoding-explanation.html
- PostgreSQL 17 docs — CREATE SUBSCRIPTION (
failoveroption): https://www.postgresql.org/docs/17/sql-createsubscription.html - PostgreSQL 17 docs — ALTER SUBSCRIPTION: https://www.postgresql.org/docs/17/sql-altersubscription.html
- PostgreSQL 17 docs —
pg_replication_slotsview: https://www.postgresql.org/docs/17/view-pg-replication-slots.html - PostgreSQL parameter notes —
sync_replication_slots: https://postgresqlco.nf/doc/en/param/sync_replication_slots/ - PostgreSQL parameter notes —
synchronized_standby_slots: https://postgresqlco.nf/doc/en/param/synchronized_standby_slots/