FIX Cancel-Reject & Partial-Fill Race Slippage Playbook
Why this matters
Cancel logic looks simple until a live order gets hit while the cancel is in flight.
Then the strategy stops living in a clean binary world of “working” vs “gone” and enters a race window where:
- the venue may still execute the resting order,
- the broker may acknowledge Pending Cancel before the order is actually removed,
- a Cancel Reject (35=9) may arrive with the order still partially filled and live,
- and the router can mistake a still-working leaf for dead inventory, or vice versa.
That creates a sneaky slippage tax:
- residuals are misestimated,
- urgency controllers overreact,
- hedges fire off stale leaves,
- queue priority is destroyed by unnecessary re-entry,
- and post-race cleanup often crosses the spread at exactly the wrong time.
This is not rare edge-case trivia. In FIX semantics, Pending Cancel has higher precedence than Partially Filled, and FIX state-change matrices explicitly show executions arriving while cancel is active. If your local controller collapses that state machine into one boolean, it will eventually pay for it.
Failure mode in one line
A cancel request races with fresh executions; the resulting Pending Cancel / Partial Fill / Cancel Reject sequence makes order state temporarily ambiguous, and that ambiguity leaks into over-cancel, over-route, or late catch-up slippage.
Protocol facts that matter operationally
A few FIX-level facts drive the real production risk:
Pending Cancel does not mean canceled. It only confirms the cancel request was received and is being processed.
Pending Cancel has higher OrdStatus precedence than Partially Filled. So an order can be economically “partially filled and still vulnerable to more fills” while the reported state is
Pending Cancel.Order Cancel Reject (35=9) carries the OrdStatus after the reject is applied. That means the reject is not just an error; it is a state-bearing message telling you the order may still be
NeworPartially Filled.FIX order-state matrices explicitly include the scenario where executions occur while a cancel request is active. The cancel path is therefore inherently concurrent, not sequential.
The implementation consequence is brutal: if the engine treats cancel acknowledgement, cancel completion, and cancel rejection as one conceptual event, it will mis-handle live leaves.
Observable signatures
1) Cancel requests spike, but live exposure does not fall immediately
OrderCancelRequest (35=F)volume rises.- Working-order count drops more slowly than expected.
- Leaves remain nonzero during a supposedly defensive phase.
2) Partial fills arrive after local logic thinks the order is “basically gone”
- A cancel request has already been sent.
- One or more
ExecutionReport (35=8)fills land afterward. - Strategy responds as if those fills were anomalous, duplicated, or late.
3) Cancel rejects cluster with stale residual jumps
OrderCancelReject (35=9)appears withOrdStatus=Partially FilledorOrdStatus=New.- Residual inventory estimate suddenly widens or flips sign.
- Child-order scheduler emits compensating orders immediately after the reject.
4) Queue priority gets burned with little informational gain
- Orders are canceled “for safety,” then re-entered anyway.
- Re-entry happens at the same or worse price level.
- Post-recovery fill quality deteriorates versus pre-race queue position.
5) Markout damage concentrates right after cancel-race windows
- 1s / 5s / 30s post-decision markout worsens around cancel bursts.
- Spread/depth alone do not explain the damage.
- Tail cost lives in the reconciliation window, not in average conditions.
6) Parent completion suddenly switches from passive to urgent
- Underfill deficit grows because passive leaves were incorrectly assumed canceled.
- Deadline pressure keeps accumulating.
- Router exits the ambiguity window with an avoidable catch-up burst.
The mechanical path to slippage
Step 1) A live resting order becomes locally undesirable
Maybe the quote moved, toxicity rose, the parent is rebalancing, or a venue gate fired.
Step 2) The strategy sends a cancel
Locally, many systems now mark that leaf as “on the way out.”
Step 3) The venue still has matching rights on the resting leaf
Before the cancel is finalized, the order can still receive fills.
Step 4) FIX reports a mixed state
Possible sequence:
- fill / partial fill,
Pending Cancel,- more fill(s),
- then either
CanceledorCancel Reject.
Step 5) Local controller overcompresses the state machine
Typical bad assumptions:
- “pending cancel” means no more fill risk,
- cancel reject means nothing changed,
- partial fill after cancel must be stale or duplicate,
- or residual can be recomputed from local intent instead of venue-confirmed state.
Step 6) Strategy pays the tax
That tax shows up as one or more of:
- duplicate replacement flow,
- missed passive completion,
- unnecessary marketable cleanup,
- hedge overshoot / unwind,
- or queue-rank reset after re-entry.
Core model
Define:
L_true(t): true live leaves still resting at venueL_obs(t): locally assumed leaves during cancel raceF_post(t): fills received after cancel request was sentP_cancel(t): probability cancel is still pending, not finalR(t): residual parent quantityU(t): urgency chosen by schedulerQ_loss(t): queue-priority loss from unnecessary re-entryC_rej(t): indicator that cancel was rejected
Then during the race window:
L_obs(t) = L_true(t) - hidden_working_leaves(t) + stale_dead_assumption(t)
R_obs(t) = R_true(t) + epsilon_state(t)
where epsilon_state(t) is driven by:
- post-cancel fills not yet incorporated,
- rejected cancels that restore live exposure,
- and local assumptions that pending-cancel implies economic removal.
A practical decomposition of the cancel-race slippage tax is:
IS_cancel_race ≈ stale_residual_cost + catchup_cost + overhedge_cost + Q_loss + reject_recovery_cost
Interpretation:
- stale_residual_cost: you think you have more or less left than reality,
- catchup_cost: you become artificially urgent after underestimating live completion,
- overhedge_cost: you route/hedge against inventory that later fills or remains live,
- Q_loss: you abandon queue position and pay to rejoin,
- reject_recovery_cost: you spend time and spread recovering from a bad cancel assumption.
State ambiguity taxonomy
A) Pending-cancel optimism
The system assumes pending cancel is almost equivalent to canceled.
Risk: fills keep landing while the engine already routes replacement liquidity elsewhere.
B) Reject-blindness
The system treats a cancel reject as a transport-side nuisance instead of a new authoritative state.
Risk: the original leaf remains live longer than the model believes.
C) Fill-after-cancel disbelief
Post-cancel fills are labeled late, suspicious, or duplicate by local logic.
Risk: the scheduler undercredits true completion and overtrades the residual.
D) Safety-cancel reflex
The engine cancels broadly to reduce uncertainty.
Risk: uncertainty falls, but queue edge is destroyed and completion quality worsens.
E) Hedge-before-finality
Portfolio or risk logic hedges as though the working leaf is gone.
Risk: later fills convert the hedge into an unintended directional trade.
Practical feature set
Cancel-race features
cancel_req_count_1spending_cancel_countcancel_reject_count_1mcancel_reject_ratetoo_late_to_cancel_ratealready_pending_cancel_rateunknown_order_cancel_reject_rate
Timing features
cancel_to_pending_cancel_mscancel_to_final_cancel_mscancel_to_cancel_reject_mscancel_to_post_fill_ms_p50/p95pending_cancel_dwell_mscancel_race_window_ms
State-integrity features
post_cancel_fill_qtypost_cancel_fill_fractionambiguous_live_leaves_qtyresidual_uncertainty_qtylocal_vs_venue_leaves_gapdropcopy_vs_orderentry_live_qty_gap
Execution-impact features
reenter_after_cancel_ratequeue_reset_bps_estimatecatchup_flow_after_reject_qtypost_reject_markout_1s_5s_30scompletion_deficit_after_cancel_race_pctoverhedge_unwind_qty
Highest-risk situations
1) Fast-moving quotes with passive pullbacks
When the book is moving and the strategy repeatedly tries to pull/passively re-post, cancel races become a control-loop problem, not a one-off exception.
2) Short-deadline schedules
If there is little time left, even a small cancel ambiguity can force a large end-of-schedule urgency jump.
3) Multi-venue child routing
A stale cancel assumption on one venue contaminates parent residual logic everywhere else.
4) Hedge-coupled execution
If each partial fill drives hedge activity, cancel-race ambiguity becomes a cross-asset slippage problem.
5) Slow broker/exchange cancel paths
Any environment where pending-cancel dwell time is nontrivial deserves explicit modeling. Long dwell means more economic exposure after the cancel request than the strategy intuitively expects.
6) Noisy reconnect / replay conditions
During session recovery, cancel-race ambiguity compounds with message-order ambiguity and duplicate-handling pressure.
Regime state machine
CLEAN
- Working leaves trusted.
- Normal cancel semantics.
- Normal urgency.
CANCEL_SENT
Trigger:
- cancel request emitted.
Actions:
- Mark leaf as cancel-pending risk, not as dead.
- Reduce confidence in local leaves estimate.
- Block residual-based urgency jumps that assume immediate removal.
PENDING_CANCEL_EXPOSED
Trigger:
Pending Cancelacknowledged or dwell exceeds threshold.
Actions:
- Treat leaf as economically live until proven otherwise.
- Keep hedge logic in conservative mode.
- Do not spawn replacement orders that would double-count the leaf unless explicit branch logic allows it.
POST_CANCEL_FILL_ACTIVE
Trigger:
- one or more fills arrive after cancel request time.
Actions:
- Recompute residual from authoritative cum/leaves data.
- Suppress panic cancellation of sibling leaves unless parent risk truly changed.
- Raise state-integrity telemetry.
CANCEL_REJECT_RECOVERY
Trigger:
OrderCancelReject (35=9)received.
Actions:
- Treat reject as state-bearing.
- Re-read
OrdStatus,OrigClOrdID,ClOrdID, and reason code. - Reconcile whether original leaf is still working, partially filled, or terminal.
- Freeze aggressive catch-up until venue-confirmed leaves are rebuilt.
FINALIZED
Trigger:
- authoritative terminal cancel or authoritative leaves reconciliation.
Actions:
- Resume normal routing.
- Attribute realized cost to cancel-race bucket for TCA.
SAFE_CONTAIN
Trigger:
- residual uncertainty or reject rate exceeds threshold.
Actions:
- cap participation,
- stop cross-venue catch-up bursts,
- prefer fewer, larger state-confirmed actions,
- route only after authoritative state convergence.
Control rules that actually help
1) Model cancel finality as probabilistic
The order is not economically dead at cancel-send time. Weight residual confidence by cancel-path latency and recent reject behavior.
2) Separate intent state from venue state
Local intent: “I want this gone.” Venue truth: “It may still fill.” Never let intent overwrite venue-confirmed leaves.
3) Treat Cancel Reject as a state update, not just an exception
35=9 must feed the same authoritative state machine as 35=8, especially through OrdStatus(39).
4) Penalize urgency when residual uncertainty is high
If residual_uncertainty_qty is elevated, do not let the scheduler become maximally aggressive. Uncertain residuals should reduce confidence, not amplify urgency.
5) Avoid immediate same-price re-entry after broad safety cancels
If the leaf was probably still economically useful, canceling and instantly re-entering just converts uncertainty into queue loss.
6) Add a hedge holdback window after cancel races
Small, bounded holdback windows can prevent hedge overshoot when post-cancel fills are still plausible.
7) Attribute cancel-race slippage separately in TCA
Otherwise the desk misdiagnoses the cost as generic spread/impact or generic venue toxicity.
TCA / KPI layer
Track these explicitly:
PCRR — Post-Cancel Fill Rate
- fraction of cancel requests followed by at least one fill before terminal resolution
PCFQ — Post-Cancel Fill Quantity
- fill quantity received after cancel request timestamp
CRBR — Cancel Reject Burst Rate
- clustered reject intensity per venue/session window
PCDW95 — Pending Cancel Dwell p95
- tail time spent in pending-cancel state
RUG — Residual Uncertainty Gap
|R_obs - R_reconciled| / parent_qty
QRL — Queue Reset Loss
- estimated bps lost from cancel→reenter patterns that were avoidable
PRM5 — Post-Reject Markout 5s
- markout after cancel reject recovery windows
OCU — Overhedge Cleanup Units
- hedge/unwind quantity attributable to cancel-race ambiguity
These should be segmented by:
- venue,
- symbol liquidity bucket,
- child tactic,
- time-to-deadline bucket,
- and session-health regime.
Validation approach
Backtest / replay questions
- After a cancel request, how often did more fill quantity arrive before terminal cancel?
- If those post-cancel fills were hidden from the controller, how much extra catch-up flow would it have sent?
- How much queue loss came from cancel→reenter patterns where the original leaf would have completed acceptably?
- What fraction of tail slippage clusters are preceded by pending-cancel dwell or cancel rejects?
Shadow-mode checks
- Compare production policy vs a counterfactual that waits for authoritative cancel finality.
- Compare against a policy that preserves resting leaves unless uncertainty crosses a strict threshold.
- Estimate tail cost change, not just mean fill price change.
Failure-injection drills
Simulate:
- delayed cancel acknowledgements,
- post-cancel fills,
- cancel rejects with
Too late to cancel, - reject bursts during fast markets,
- and order-entry vs drop-copy disagreement.
If the strategy only survives the clean “cancel then canceled” path, it is not production-ready.
Anti-patterns
- Binary leaf state: working vs dead with no pending/ambiguous mode
- Intent-overwrites-truth: local cancel intent zeros out venue leaves
- Reject-as-log-line: cancel reject is logged but not folded into state
- Immediate replacement reflex: new child launched before race window resolves
- No TCA bucket: all damage blamed on spread or venue toxicity
- No hedge dampener: hedge reacts faster than order-state certainty
Implementation sketch
A robust controller usually needs:
authoritative per-leaf state machine
NEW -> PARTIAL -> PENDING_CANCEL -> {PARTIAL, CANCELED, FILLED}- with
35=9allowed to move the leaf back into a live/partial regime
residual-confidence score
- derived from pending-cancel dwell, reject rate, and cross-channel agreement
ambiguity-aware scheduler
- caps urgency and replacement flow under high residual uncertainty
cancel-race ledger
- stores request time, terminal time, post-cancel fill qty, reject reason, and recovery action
TCA attribution hooks
- assign slippage to cancel-race bucket rather than generic impact bucket
Bottom line
Cancel logic is not a clerical detail.
A live order can still trade while your cancel is “in progress,” and FIX explicitly models that world. If your execution stack treats pending cancel like economic finality, or treats cancel reject like a mere error instead of a state-bearing event, you quietly turn state ambiguity into slippage.
The fix is not “cancel faster.”
The fix is to treat cancel finality as uncertain, model the race window explicitly, and stop converting protocol semantics into fake certainty.
References
- FIX Trading Community — Order State Changes (OrdStatus precedence; cancel scenarios including executions while cancel is active): https://www.fixtrading.org/online-specification/order-state-changes/
- OnixS FIX 4.2 Dictionary — D4: Cancel request issued for a part-filled order — executions occur whilst cancel request is active: https://www.onixs.biz/fix-dictionary/4.2/app_d4.html
- OnixS FIX 4.4 Dictionary — Order Cancel Reject <9> message (reject semantics and fields): https://www.onixs.biz/fix-dictionary/4.4/msgtype_9_9.html
- B2BITS FIX Dictionary — Execution Report (MsgType=8) (OrdStatus precedence and Pending Cancel meaning): https://www.b2bits.com/fixopaedia/fixdic50/message_Execution_Report_8_.html