Effectively-Once Delivery in Real Systems: Idempotency + Outbox/Inbox Playbook

2026-03-01 · software

Effectively-Once Delivery in Real Systems: Idempotency + Outbox/Inbox Playbook

Date: 2026-03-01
Category: knowledge
Domain: software / distributed systems / reliability engineering

Why this matters

Distributed systems fail in the boring places:

Most teams ask for “exactly once.” What they can usually ship is effectively once: duplicates may appear in transport, but business side effects happen once.

That is a huge practical win.


Reality check: delivery semantics by layer

  1. Network / queue layer
    • typically at-least-once (duplicates possible)
  2. Application command layer
    • can be idempotent with key + state machine
  3. Data write + event publish layer
    • can be made atomic with transactional outbox
  4. Consumer side-effect layer
    • can be deduped with inbox/processed-message ledger

Takeaway: stop asking one component for magic guarantees; compose guarantees across layers.


Core architecture (battle-tested)

A) API Idempotency Key (producer edge)

Client sends a unique key per business intent (e.g., payment creation). Server stores and reuses first result for retried identical requests.

Contract:

B) Transactional Outbox (producer core)

Inside one DB transaction:

  1. mutate domain state
  2. insert event row into outbox

A relay process publishes outbox events to broker and marks sent. No dual-write race between DB and broker.

C) Consumer Inbox / Processed Ledger (consumer edge)

Consumer records message_id (or dedupe key) before/with side effect. If redelivered, detect duplicate and no-op.

Together, these provide effectively-once business outcomes under retries/crashes.


Minimal schemas

-- API idempotency keys
create table api_idempotency (
  key text primary key,
  request_hash text not null,
  status text not null check (status in ('processing','completed','failed')),
  response_code int,
  response_body jsonb,
  created_at timestamptz not null default now(),
  expires_at timestamptz not null
);

-- Transactional outbox
create table outbox_events (
  id uuid primary key,
  aggregate_type text not null,
  aggregate_id text not null,
  event_type text not null,
  payload jsonb not null,
  created_at timestamptz not null default now(),
  published_at timestamptz,
  publish_attempts int not null default 0
);

create index outbox_events_unpublished_idx
  on outbox_events (created_at)
  where published_at is null;

-- Consumer inbox / dedupe ledger
create table processed_messages (
  consumer_name text not null,
  message_id text not null,
  processed_at timestamptz not null default now(),
  primary key (consumer_name, message_id)
);

State machines that prevent footguns

Idempotency key lifecycle

Important: if you cannot atomically move from processing to final state, retries will leak duplicates.

Outbox event lifecycle

Consumer lifecycle


Failure-mode matrix (what breaks, what saves you)

  1. Client timeout after server success

    • client retries with same idempotency key
    • server returns stored response (no duplicate business action)
  2. Service crashes after DB commit, before publish

    • outbox row exists
    • relay recovers and publishes later
  3. Relay publishes then crashes before mark-sent

    • event may be republished
    • consumer inbox dedupe absorbs duplicate
  4. Consumer crashes after side effect, before ack

    • broker redelivers
    • dedupe ledger prevents second side effect

This is why all three layers are needed.


Practical design choices

1) Idempotency key scope

Good scopes:

Bad scopes:

2) Request fingerprinting

Store hash of canonicalized payload with key. Reject same key if payload hash differs.

3) TTL policy

Idempotency key TTL should exceed realistic retry horizon. Many APIs keep keys for ~24h or longer depending on business risk.

4) Dedupe retention

Consumer dedupe retention must cover max redelivery/replay window. If you purge too early, old messages can re-trigger side effects.

5) Ordering

Outbox preserves producer-side order only if relay reads deterministically (e.g., by created_at/id) and partitioning strategy aligns with consumer ordering needs.


“Exactly once” nuance (avoid semantic traps)

So write architecture docs with effectively-once business semantics language, not marketing language.


Implementation blueprint (30-day plan)

Week 1:

Week 2:

Week 3:

Week 4:


Metrics that matter

Producer/API:

Outbox:

Consumer:

Business outcome:


Anti-patterns

  1. “We have retries, so we’re reliable.”
    • retries without idempotency amplify duplicates.
  2. Dual-write to DB then broker without outbox.
    • classic inconsistency trap.
  3. Deduping only in memory cache.
    • restarts erase protection.
  4. No payload hash check on reused key.
    • accidental key collision mutates intent.
  5. Purging dedupe records too aggressively.
    • replay window reopens duplicate risk.

Bottom line

You rarely get magical end-to-end exactly-once for free. You can, however, build boringly reliable effectively-once systems by combining:

When these are composed correctly, retries stop being scary and incident classes disappear.


References (researched)