Effectively-Once Delivery in Real Systems: Idempotency + Outbox/Inbox Playbook
Date: 2026-03-01
Category: knowledge
Domain: software / distributed systems / reliability engineering
Why this matters
Distributed systems fail in the boring places:
- client times out after the server actually succeeded
- broker redelivers after consumer crash
- worker retries a side effect and charges twice
- event publish succeeds but DB transaction rolls back (or vice versa)
Most teams ask for “exactly once.” What they can usually ship is effectively once: duplicates may appear in transport, but business side effects happen once.
That is a huge practical win.
Reality check: delivery semantics by layer
- Network / queue layer
- typically at-least-once (duplicates possible)
- Application command layer
- can be idempotent with key + state machine
- Data write + event publish layer
- can be made atomic with transactional outbox
- Consumer side-effect layer
- can be deduped with inbox/processed-message ledger
Takeaway: stop asking one component for magic guarantees; compose guarantees across layers.
Core architecture (battle-tested)
A) API Idempotency Key (producer edge)
Client sends a unique key per business intent (e.g., payment creation). Server stores and reuses first result for retried identical requests.
Contract:
- same key + same payload => same response
- same key + different payload => reject (409/422)
B) Transactional Outbox (producer core)
Inside one DB transaction:
- mutate domain state
- insert event row into
outbox
A relay process publishes outbox events to broker and marks sent. No dual-write race between DB and broker.
C) Consumer Inbox / Processed Ledger (consumer edge)
Consumer records message_id (or dedupe key) before/with side effect.
If redelivered, detect duplicate and no-op.
Together, these provide effectively-once business outcomes under retries/crashes.
Minimal schemas
-- API idempotency keys
create table api_idempotency (
key text primary key,
request_hash text not null,
status text not null check (status in ('processing','completed','failed')),
response_code int,
response_body jsonb,
created_at timestamptz not null default now(),
expires_at timestamptz not null
);
-- Transactional outbox
create table outbox_events (
id uuid primary key,
aggregate_type text not null,
aggregate_id text not null,
event_type text not null,
payload jsonb not null,
created_at timestamptz not null default now(),
published_at timestamptz,
publish_attempts int not null default 0
);
create index outbox_events_unpublished_idx
on outbox_events (created_at)
where published_at is null;
-- Consumer inbox / dedupe ledger
create table processed_messages (
consumer_name text not null,
message_id text not null,
processed_at timestamptz not null default now(),
primary key (consumer_name, message_id)
);
State machines that prevent footguns
Idempotency key lifecycle
processing-> request accepted, work in progresscompleted-> final response persisted and reusablefailed-> terminal failure persisted (optional policy)
Important: if you cannot atomically move from processing to final state, retries will leak duplicates.
Outbox event lifecycle
created(inside same transaction as domain write)publishing(relay owns a batch)published(acknowledged by broker)dead-lettered(after bounded retries, with alert)
Consumer lifecycle
- receive message
- insert into
processed_messages(or upsert) - if conflict => duplicate, skip side effect
- else execute side effect + commit
Failure-mode matrix (what breaks, what saves you)
Client timeout after server success
- client retries with same idempotency key
- server returns stored response (no duplicate business action)
Service crashes after DB commit, before publish
- outbox row exists
- relay recovers and publishes later
Relay publishes then crashes before mark-sent
- event may be republished
- consumer inbox dedupe absorbs duplicate
Consumer crashes after side effect, before ack
- broker redelivers
- dedupe ledger prevents second side effect
This is why all three layers are needed.
Practical design choices
1) Idempotency key scope
Good scopes:
user_id + operation + client_request_uuid- payment intent id
- order intent id
Bad scopes:
- global random key with no business partitioning
- timestamp-only key
2) Request fingerprinting
Store hash of canonicalized payload with key. Reject same key if payload hash differs.
3) TTL policy
Idempotency key TTL should exceed realistic retry horizon. Many APIs keep keys for ~24h or longer depending on business risk.
4) Dedupe retention
Consumer dedupe retention must cover max redelivery/replay window. If you purge too early, old messages can re-trigger side effects.
5) Ordering
Outbox preserves producer-side order only if relay reads deterministically (e.g., by created_at/id) and partitioning strategy aligns with consumer ordering needs.
“Exactly once” nuance (avoid semantic traps)
- Kafka idempotent producer + transactions strengthen guarantees significantly.
- But end-to-end exactly-once still requires correct app boundaries, consumer isolation, and side-effect control.
- External APIs (email, payment rails, webhooks) often remain at-least-once from your perspective.
So write architecture docs with effectively-once business semantics language, not marketing language.
Implementation blueprint (30-day plan)
Week 1:
- add idempotency table + middleware on critical POST endpoints
- enforce key + payload hash contract
Week 2:
- implement transactional outbox in command handlers
- deploy relay with bounded retries + metrics
Week 3:
- add consumer dedupe ledger on high-risk consumers
- test crash/retry scenarios with fault injection
Week 4:
- define SLOs/alerts, run replay drills, document runbooks
- migrate remaining critical workflows
Metrics that matter
Producer/API:
- idempotency replay hit rate
- conflicting same-key payload rate
- stuck
processingkey count
Outbox:
- unpublished backlog size/age
- publish retry rate
- publish latency p50/p95
Consumer:
- dedupe hit ratio
- DLQ rate
- side-effect failure after dedupe insert
Business outcome:
- duplicate charge/order incident count (target: zero)
Anti-patterns
- “We have retries, so we’re reliable.”
- retries without idempotency amplify duplicates.
- Dual-write to DB then broker without outbox.
- classic inconsistency trap.
- Deduping only in memory cache.
- restarts erase protection.
- No payload hash check on reused key.
- accidental key collision mutates intent.
- Purging dedupe records too aggressively.
- replay window reopens duplicate risk.
Bottom line
You rarely get magical end-to-end exactly-once for free. You can, however, build boringly reliable effectively-once systems by combining:
- API idempotency keys
- transactional outbox
- consumer inbox/processed-message dedupe
When these are composed correctly, retries stop being scary and incident classes disappear.
References (researched)
- Stripe API: Idempotent requests
https://docs.stripe.com/api/idempotent_requests - Stripe engineering blog: Designing robust and predictable APIs with idempotency
https://stripe.com/blog/idempotency - Microservices.io: Transactional Outbox pattern
https://microservices.io/patterns/data/transactional-outbox.html - AWS Prescriptive Guidance: Transactional Outbox pattern
https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/transactional-outbox.html - Amazon SQS FIFO exactly-once processing (dedupe window behavior)
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues-exactly-once-processing.html - Apache Kafka Producer docs (idempotent + transactional producer)
https://kafka.apache.org/11/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html - Confluent docs: Kafka delivery semantics overview
https://docs.confluent.io/kafka/design/delivery-semantics.html