Order Idempotency & Duplicate-Order Prevention Playbook (FIX/API)
Date: 2026-03-11
Category: knowledge
Domain: finance / execution engineering / trading operations
Why this matters
In live execution, the most expensive bug is often not a bad signal — it is a duplicated order.
Typical path:
- you send an order,
- ack is delayed or dropped,
- your retry logic fires,
- venue/broker treats retry as a new intent,
- you get unintended extra exposure.
This is a reliability problem first, and a PnL problem immediately after.
Core principle
Treat order submission as an idempotent intent pipeline, not a best-effort message send.
- Retries are inevitable.
- Disconnect/reconnect is inevitable.
- Sequence resets and replay flags happen.
- Therefore duplicate-prevention must be a first-class control, not an afterthought.
Identifier semantics you must keep straight
1) ClOrdID (FIX Tag 11)
- Client-assigned order identifier.
- Must be unique at least within trading day; for multi-day flows, encode date/scope.
- This is your primary idempotency key on outbound intent.
2) OrigClOrdID (FIX Tag 41)
- Links cancel/replace request to previous client order ID.
- Critical for lineage when modifying orders.
3) OrderID (FIX Tag 37)
- Broker/venue-assigned order identifier.
- Useful for downstream reconciliation, but not your submission idempotency root.
4) ExecID (FIX Tag 17)
- Unique execution report identifier from sell-side.
- Use for inbound dedupe/replay-safe fill processing.
5) PossDupFlag (43) / PossResend (97)
- Signals that message may be retransmitted/replayed.
- Your consumers must be safe under replay.
Failure modes that create accidental duplicates
Ack-timeout blind retry
- “No ack in X ms => send new order” without stable
ClOrdIDreuse policy.
- “No ack in X ms => send new order” without stable
Session reconnect with volatile ID generator
- ID sequence restarts after process crash/redeploy.
Cancel/replace race
- Replace submitted while original ack state is unknown; both legs become live.
Gateway failover split-brain
- Primary and standby both emit the same strategy intent independently.
Replay-unaware execution consumer
- Duplicate
ExecutionReportcounted twice in position/PnL.
- Duplicate
Practical architecture (minimal but robust)
1) Intent Ledger (authoritative)
Before sending to broker, persist:
intent_id(internal UUID),- deterministic
cl_ord_id, - symbol/side/qty/price/tif/account fingerprint,
- lifecycle state (
created -> sent -> acked/rejected/canceled/filled), - timestamps and route metadata.
Rule: no outbound send without durable ledger write.
2) Deterministic ClOrdID policy
Recommended pattern:
<strategy>-<yyyymmdd>-<session>-<monotonic-seq>-<short-checksum>
Rules:
- sequence survives process restart,
- never regenerate different ID for same intent,
- never reuse old IDs inside safety retention window.
3) Retry contract
On timeout/uncertain state:
- first action: query status (if supported),
- retry with same idempotency identity,
- only create new intent with explicit human/strategy decision.
4) Inbound dedupe keys
Maintain a processed set on ExecID (+ venue/session scope) and guard against replay.
Rule: position/PnL updates must be idempotent.
5) Reconciliation loop
Continuously reconcile:
- intent ledger,
- broker order state,
- executions/drop copy,
- internal position.
Any divergence enters incident workflow (not silent auto-heal).
Control states (ops-friendly)
NORMAL
- duplicate metrics near baseline,
- ack latency within normal band,
- no ledger/order-state breaks.
Action: standard operation.
DEGRADED
Triggers:
- ack timeout percentile spike,
- duplicate reject ratio rising,
- reconnect frequency elevated.
Action:
- reduce aggression,
- widen retry timers,
- force status-query-first path,
- page operator if sustained.
DUP_RISK
Triggers:
- duplicate rejects exceed threshold,
- unresolved uncertain orders accumulate,
- reconciliation breaks not converging.
Action:
- disable auto-new intents for affected routes,
- permit cancel-only / reduce-only,
- require explicit operator release.
SAFE
Triggers:
- split-brain suspicion,
- ledger persistence instability,
- exchange/broker state uncertainty too high.
Action:
- hard stop new exposure,
- preserve capital and audit trail,
- recover state before reopening.
Metrics that actually catch this early
- Duplicate Reject Rate (DRR) = duplicate-ID rejects / new orders
- Uncertain Order Count (UOC) = sent but unresolved by timeout + query
- Replay Drop Rate (RDR) = replayed execution reports safely ignored / total exec reports
- ID Collision Count (ICC) = attempted
ClOrdIDreuse events - Reconciliation Break Duration (RBD) = time from divergence detection to convergence
If DRR and UOC rise together, move to DEGRADED quickly.
Hard guardrails (non-negotiable)
- No ephemeral ID generators (memory-only counters are forbidden).
- No side effects before dedupe check on inbound executions.
- No auto-resubmit with new ClOrdID while prior state is unknown.
- No silent healing of reconciliation breaks — alert and track incident id.
- No deployment without duplicate-order game day (disconnect/replay/failover drills).
One-line runbook for incidents
- Freeze new risk on affected route.
- Snapshot ledger + broker open orders + latest exec stream offsets.
- Resolve uncertain intents by status query / broker desk confirmation.
- Cancel unintended residuals.
- Replay execution stream through idempotent consumer and verify position parity.
- Postmortem with specific control change (timer, dedupe key, failover fencing, etc.).
References
- B2BITS FIXopaedia — ClOrdID (Tag 11), uniqueness guidance
https://www.b2bits.com/fixopaedia/fixdic41/tag_11_ClOrdID.html - OnixS FIX Dictionary — duplicate
ClOrdIDstate matrix example (FIX 4.4 Appendix D F.1.a)
https://www.onixs.biz/fix-dictionary/4.4/app_df.1.a.html - OnixS FIX Dictionary —
PossDupFlag(43)
https://www.onixs.biz/fix-dictionary/4.4/tagnum_43.html - OnixS FIX Dictionary —
PossResend(97)
https://www.onixs.biz/fix-dictionary/4.4/tagnum_97.html - B2BITS FIXopaedia —
ExecID(Tag 17), uniqueness guidance
https://www.b2bits.com/fixopaedia/fixdic44/tag_17_ExecID_.html
One-line takeaway
In trading infra, “retry” without strict idempotency is just a polite word for accidental leverage.