Effectively-Once Delivery in Real Systems: Idempotency + Outbox/Inbox Playbook

Date: 2026-03-01
Category: knowledge
Domain: software / distributed systems / reliability engineering

Why this matters

Distributed systems fail in the boring places:

client times out after the server actually succeeded
broker redelivers after consumer crash
worker retries a side effect and charges twice
event publish succeeds but DB transaction rolls back (or vice versa)

Most teams ask for “exactly once.” What they can usually ship is effectively once: duplicates may appear in transport, but business side effects happen once.

That is a huge practical win.

Reality check: delivery semantics by layer

Network / queue layer
- typically at-least-once (duplicates possible)
Application command layer
- can be idempotent with key + state machine
Data write + event publish layer
- can be made atomic with transactional outbox
Consumer side-effect layer
- can be deduped with inbox/processed-message ledger

Takeaway: stop asking one component for magic guarantees; compose guarantees across layers.

Core architecture (battle-tested)

A) API Idempotency Key (producer edge)

Client sends a unique key per business intent (e.g., payment creation). Server stores and reuses first result for retried identical requests.

Contract:

same key + same payload => same response
same key + different payload => reject (409/422)

B) Transactional Outbox (producer core)

Inside one DB transaction:

mutate domain state
insert event row into outbox

A relay process publishes outbox events to broker and marks sent. No dual-write race between DB and broker.

C) Consumer Inbox / Processed Ledger (consumer edge)

Consumer records message_id (or dedupe key) before/with side effect. If redelivered, detect duplicate and no-op.

Together, these provide effectively-once business outcomes under retries/crashes.

Minimal schemas

-- API idempotency keys
create table api_idempotency (
  key text primary key,
  request_hash text not null,
  status text not null check (status in ('processing','completed','failed')),
  response_code int,
  response_body jsonb,
  created_at timestamptz not null default now(),
  expires_at timestamptz not null
);

-- Transactional outbox
create table outbox_events (
  id uuid primary key,
  aggregate_type text not null,
  aggregate_id text not null,
  event_type text not null,
  payload jsonb not null,
  created_at timestamptz not null default now(),
  published_at timestamptz,
  publish_attempts int not null default 0
);

create index outbox_events_unpublished_idx
  on outbox_events (created_at)
  where published_at is null;

-- Consumer inbox / dedupe ledger
create table processed_messages (
  consumer_name text not null,
  message_id text not null,
  processed_at timestamptz not null default now(),
  primary key (consumer_name, message_id)
);

State machines that prevent footguns

Idempotency key lifecycle

processing -> request accepted, work in progress
completed -> final response persisted and reusable
failed -> terminal failure persisted (optional policy)

Important: if you cannot atomically move from processing to final state, retries will leak duplicates.

Outbox event lifecycle

created (inside same transaction as domain write)
publishing (relay owns a batch)
published (acknowledged by broker)
dead-lettered (after bounded retries, with alert)

Consumer lifecycle

receive message
insert into processed_messages (or upsert)
if conflict => duplicate, skip side effect
else execute side effect + commit

Failure-mode matrix (what breaks, what saves you)

Client timeout after server success
- client retries with same idempotency key
- server returns stored response (no duplicate business action)
Service crashes after DB commit, before publish
- outbox row exists
- relay recovers and publishes later
Relay publishes then crashes before mark-sent
- event may be republished
- consumer inbox dedupe absorbs duplicate
Consumer crashes after side effect, before ack
- broker redelivers
- dedupe ledger prevents second side effect

This is why all three layers are needed.

Practical design choices

1) Idempotency key scope

Good scopes:

user_id + operation + client_request_uuid
payment intent id
order intent id

Bad scopes:

global random key with no business partitioning
timestamp-only key

2) Request fingerprinting

Store hash of canonicalized payload with key. Reject same key if payload hash differs.

3) TTL policy

Idempotency key TTL should exceed realistic retry horizon. Many APIs keep keys for ~24h or longer depending on business risk.

4) Dedupe retention

Consumer dedupe retention must cover max redelivery/replay window. If you purge too early, old messages can re-trigger side effects.

5) Ordering

Outbox preserves producer-side order only if relay reads deterministically (e.g., by created_at/id) and partitioning strategy aligns with consumer ordering needs.

“Exactly once” nuance (avoid semantic traps)

Kafka idempotent producer + transactions strengthen guarantees significantly.
But end-to-end exactly-once still requires correct app boundaries, consumer isolation, and side-effect control.
External APIs (email, payment rails, webhooks) often remain at-least-once from your perspective.

So write architecture docs with effectively-once business semantics language, not marketing language.

Implementation blueprint (30-day plan)

Week 1:

add idempotency table + middleware on critical POST endpoints
enforce key + payload hash contract

Week 2:

implement transactional outbox in command handlers
deploy relay with bounded retries + metrics

Week 3:

add consumer dedupe ledger on high-risk consumers
test crash/retry scenarios with fault injection

Week 4:

define SLOs/alerts, run replay drills, document runbooks
migrate remaining critical workflows

Metrics that matter

Producer/API:

idempotency replay hit rate
conflicting same-key payload rate
stuck processing key count

Outbox:

unpublished backlog size/age
publish retry rate
publish latency p50/p95

Consumer:

dedupe hit ratio
DLQ rate
side-effect failure after dedupe insert

Business outcome:

duplicate charge/order incident count (target: zero)

Anti-patterns

“We have retries, so we’re reliable.”
- retries without idempotency amplify duplicates.
Dual-write to DB then broker without outbox.
- classic inconsistency trap.
Deduping only in memory cache.
- restarts erase protection.
No payload hash check on reused key.
- accidental key collision mutates intent.
Purging dedupe records too aggressively.
- replay window reopens duplicate risk.

Bottom line

You rarely get magical end-to-end exactly-once for free. You can, however, build boringly reliable effectively-once systems by combining:

API idempotency keys
transactional outbox
consumer inbox/processed-message dedupe

When these are composed correctly, retries stop being scary and incident classes disappear.

References (researched)

Stripe API: Idempotent requests
https://docs.stripe.com/api/idempotent_requests
Stripe engineering blog: Designing robust and predictable APIs with idempotency
https://stripe.com/blog/idempotency
Microservices.io: Transactional Outbox pattern
https://microservices.io/patterns/data/transactional-outbox.html
AWS Prescriptive Guidance: Transactional Outbox pattern
https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/transactional-outbox.html
Amazon SQS FIFO exactly-once processing (dedupe window behavior)
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues-exactly-once-processing.html
Apache Kafka Producer docs (idempotent + transactional producer)
https://kafka.apache.org/11/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html
Confluent docs: Kafka delivery semantics overview
https://docs.confluent.io/kafka/design/delivery-semantics.html