Order Idempotency & Duplicate-Order Prevention Playbook (FIX/API)

2026-03-11 · finance

Order Idempotency & Duplicate-Order Prevention Playbook (FIX/API)

Date: 2026-03-11
Category: knowledge
Domain: finance / execution engineering / trading operations

Why this matters

In live execution, the most expensive bug is often not a bad signal — it is a duplicated order.

Typical path:

  1. you send an order,
  2. ack is delayed or dropped,
  3. your retry logic fires,
  4. venue/broker treats retry as a new intent,
  5. you get unintended extra exposure.

This is a reliability problem first, and a PnL problem immediately after.


Core principle

Treat order submission as an idempotent intent pipeline, not a best-effort message send.


Identifier semantics you must keep straight

1) ClOrdID (FIX Tag 11)

2) OrigClOrdID (FIX Tag 41)

3) OrderID (FIX Tag 37)

4) ExecID (FIX Tag 17)

5) PossDupFlag (43) / PossResend (97)


Failure modes that create accidental duplicates

  1. Ack-timeout blind retry

    • “No ack in X ms => send new order” without stable ClOrdID reuse policy.
  2. Session reconnect with volatile ID generator

    • ID sequence restarts after process crash/redeploy.
  3. Cancel/replace race

    • Replace submitted while original ack state is unknown; both legs become live.
  4. Gateway failover split-brain

    • Primary and standby both emit the same strategy intent independently.
  5. Replay-unaware execution consumer

    • Duplicate ExecutionReport counted twice in position/PnL.

Practical architecture (minimal but robust)

1) Intent Ledger (authoritative)

Before sending to broker, persist:

Rule: no outbound send without durable ledger write.

2) Deterministic ClOrdID policy

Recommended pattern:

<strategy>-<yyyymmdd>-<session>-<monotonic-seq>-<short-checksum>

Rules:

3) Retry contract

On timeout/uncertain state:

4) Inbound dedupe keys

Maintain a processed set on ExecID (+ venue/session scope) and guard against replay.

Rule: position/PnL updates must be idempotent.

5) Reconciliation loop

Continuously reconcile:

Any divergence enters incident workflow (not silent auto-heal).


Control states (ops-friendly)

NORMAL

Action: standard operation.

DEGRADED

Triggers:

Action:

DUP_RISK

Triggers:

Action:

SAFE

Triggers:

Action:


Metrics that actually catch this early

  1. Duplicate Reject Rate (DRR) = duplicate-ID rejects / new orders
  2. Uncertain Order Count (UOC) = sent but unresolved by timeout + query
  3. Replay Drop Rate (RDR) = replayed execution reports safely ignored / total exec reports
  4. ID Collision Count (ICC) = attempted ClOrdID reuse events
  5. Reconciliation Break Duration (RBD) = time from divergence detection to convergence

If DRR and UOC rise together, move to DEGRADED quickly.


Hard guardrails (non-negotiable)

  1. No ephemeral ID generators (memory-only counters are forbidden).
  2. No side effects before dedupe check on inbound executions.
  3. No auto-resubmit with new ClOrdID while prior state is unknown.
  4. No silent healing of reconciliation breaks — alert and track incident id.
  5. No deployment without duplicate-order game day (disconnect/replay/failover drills).

One-line runbook for incidents

  1. Freeze new risk on affected route.
  2. Snapshot ledger + broker open orders + latest exec stream offsets.
  3. Resolve uncertain intents by status query / broker desk confirmation.
  4. Cancel unintended residuals.
  5. Replay execution stream through idempotent consumer and verify position parity.
  6. Postmortem with specific control change (timer, dedupe key, failover fencing, etc.).

References


One-line takeaway

In trading infra, “retry” without strict idempotency is just a polite word for accidental leverage.