FIX Cancel-Reject & Partial-Fill Race Slippage Playbook

Why this matters

Cancel logic looks simple until a live order gets hit while the cancel is in flight.

Then the strategy stops living in a clean binary world of “working” vs “gone” and enters a race window where:

the venue may still execute the resting order,
the broker may acknowledge Pending Cancel before the order is actually removed,
a Cancel Reject (35=9) may arrive with the order still partially filled and live,
and the router can mistake a still-working leaf for dead inventory, or vice versa.

That creates a sneaky slippage tax:

residuals are misestimated,
urgency controllers overreact,
hedges fire off stale leaves,
queue priority is destroyed by unnecessary re-entry,
and post-race cleanup often crosses the spread at exactly the wrong time.

This is not rare edge-case trivia. In FIX semantics, Pending Cancel has higher precedence than Partially Filled, and FIX state-change matrices explicitly show executions arriving while cancel is active. If your local controller collapses that state machine into one boolean, it will eventually pay for it.

Failure mode in one line

A cancel request races with fresh executions; the resulting Pending Cancel / Partial Fill / Cancel Reject sequence makes order state temporarily ambiguous, and that ambiguity leaks into over-cancel, over-route, or late catch-up slippage.

Protocol facts that matter operationally

A few FIX-level facts drive the real production risk:

Pending Cancel does not mean canceled. It only confirms the cancel request was received and is being processed.
Pending Cancel has higher OrdStatus precedence than Partially Filled. So an order can be economically “partially filled and still vulnerable to more fills” while the reported state is Pending Cancel.
Order Cancel Reject (35=9) carries the OrdStatus after the reject is applied. That means the reject is not just an error; it is a state-bearing message telling you the order may still be New or Partially Filled.
FIX order-state matrices explicitly include the scenario where executions occur while a cancel request is active. The cancel path is therefore inherently concurrent, not sequential.

The implementation consequence is brutal: if the engine treats cancel acknowledgement, cancel completion, and cancel rejection as one conceptual event, it will mis-handle live leaves.

Observable signatures

1) Cancel requests spike, but live exposure does not fall immediately

OrderCancelRequest (35=F) volume rises.
Working-order count drops more slowly than expected.
Leaves remain nonzero during a supposedly defensive phase.

2) Partial fills arrive after local logic thinks the order is “basically gone”

A cancel request has already been sent.
One or more ExecutionReport (35=8) fills land afterward.
Strategy responds as if those fills were anomalous, duplicated, or late.

3) Cancel rejects cluster with stale residual jumps

OrderCancelReject (35=9) appears with OrdStatus=Partially Filled or OrdStatus=New.
Residual inventory estimate suddenly widens or flips sign.
Child-order scheduler emits compensating orders immediately after the reject.

4) Queue priority gets burned with little informational gain

Orders are canceled “for safety,” then re-entered anyway.
Re-entry happens at the same or worse price level.
Post-recovery fill quality deteriorates versus pre-race queue position.

5) Markout damage concentrates right after cancel-race windows

1s / 5s / 30s post-decision markout worsens around cancel bursts.
Spread/depth alone do not explain the damage.
Tail cost lives in the reconciliation window, not in average conditions.

6) Parent completion suddenly switches from passive to urgent

Underfill deficit grows because passive leaves were incorrectly assumed canceled.
Deadline pressure keeps accumulating.
Router exits the ambiguity window with an avoidable catch-up burst.

The mechanical path to slippage

Step 1) A live resting order becomes locally undesirable

Maybe the quote moved, toxicity rose, the parent is rebalancing, or a venue gate fired.

Step 2) The strategy sends a cancel

Locally, many systems now mark that leaf as “on the way out.”

Step 3) The venue still has matching rights on the resting leaf

Before the cancel is finalized, the order can still receive fills.

Step 4) FIX reports a mixed state

Possible sequence:

fill / partial fill,
Pending Cancel,
more fill(s),
then either Canceled or Cancel Reject.

Step 5) Local controller overcompresses the state machine

Typical bad assumptions:

“pending cancel” means no more fill risk,
cancel reject means nothing changed,
partial fill after cancel must be stale or duplicate,
or residual can be recomputed from local intent instead of venue-confirmed state.

Step 6) Strategy pays the tax

That tax shows up as one or more of:

duplicate replacement flow,
missed passive completion,
unnecessary marketable cleanup,
hedge overshoot / unwind,
or queue-rank reset after re-entry.

Core model

Define:

L_true(t): true live leaves still resting at venue
L_obs(t): locally assumed leaves during cancel race
F_post(t): fills received after cancel request was sent
P_cancel(t): probability cancel is still pending, not final
R(t): residual parent quantity
U(t): urgency chosen by scheduler
Q_loss(t): queue-priority loss from unnecessary re-entry
C_rej(t): indicator that cancel was rejected

Then during the race window:

L_obs(t) = L_true(t) - hidden_working_leaves(t) + stale_dead_assumption(t)

R_obs(t) = R_true(t) + epsilon_state(t)

where epsilon_state(t) is driven by:

post-cancel fills not yet incorporated,
rejected cancels that restore live exposure,
and local assumptions that pending-cancel implies economic removal.

A practical decomposition of the cancel-race slippage tax is:

IS_cancel_race ≈ stale_residual_cost + catchup_cost + overhedge_cost + Q_loss + reject_recovery_cost

Interpretation:

stale_residual_cost: you think you have more or less left than reality,
catchup_cost: you become artificially urgent after underestimating live completion,
overhedge_cost: you route/hedge against inventory that later fills or remains live,
Q_loss: you abandon queue position and pay to rejoin,
reject_recovery_cost: you spend time and spread recovering from a bad cancel assumption.

State ambiguity taxonomy

A) Pending-cancel optimism

The system assumes pending cancel is almost equivalent to canceled.

Risk: fills keep landing while the engine already routes replacement liquidity elsewhere.

B) Reject-blindness

The system treats a cancel reject as a transport-side nuisance instead of a new authoritative state.

Risk: the original leaf remains live longer than the model believes.

C) Fill-after-cancel disbelief

Post-cancel fills are labeled late, suspicious, or duplicate by local logic.

Risk: the scheduler undercredits true completion and overtrades the residual.

D) Safety-cancel reflex

The engine cancels broadly to reduce uncertainty.

Risk: uncertainty falls, but queue edge is destroyed and completion quality worsens.

E) Hedge-before-finality

Portfolio or risk logic hedges as though the working leaf is gone.

Risk: later fills convert the hedge into an unintended directional trade.

Practical feature set

Cancel-race features

cancel_req_count_1s
pending_cancel_count
cancel_reject_count_1m
cancel_reject_rate
too_late_to_cancel_rate
already_pending_cancel_rate
unknown_order_cancel_reject_rate

Timing features

cancel_to_pending_cancel_ms
cancel_to_final_cancel_ms
cancel_to_cancel_reject_ms
cancel_to_post_fill_ms_p50/p95
pending_cancel_dwell_ms
cancel_race_window_ms

State-integrity features

post_cancel_fill_qty
post_cancel_fill_fraction
ambiguous_live_leaves_qty
residual_uncertainty_qty
local_vs_venue_leaves_gap
dropcopy_vs_orderentry_live_qty_gap

Execution-impact features

reenter_after_cancel_rate
queue_reset_bps_estimate
catchup_flow_after_reject_qty
post_reject_markout_1s_5s_30s
completion_deficit_after_cancel_race_pct
overhedge_unwind_qty

Highest-risk situations

1) Fast-moving quotes with passive pullbacks

When the book is moving and the strategy repeatedly tries to pull/passively re-post, cancel races become a control-loop problem, not a one-off exception.

2) Short-deadline schedules

If there is little time left, even a small cancel ambiguity can force a large end-of-schedule urgency jump.

3) Multi-venue child routing

A stale cancel assumption on one venue contaminates parent residual logic everywhere else.

4) Hedge-coupled execution

If each partial fill drives hedge activity, cancel-race ambiguity becomes a cross-asset slippage problem.

5) Slow broker/exchange cancel paths

Any environment where pending-cancel dwell time is nontrivial deserves explicit modeling. Long dwell means more economic exposure after the cancel request than the strategy intuitively expects.

6) Noisy reconnect / replay conditions

During session recovery, cancel-race ambiguity compounds with message-order ambiguity and duplicate-handling pressure.

Regime state machine

CLEAN

Working leaves trusted.
Normal cancel semantics.
Normal urgency.

CANCEL_SENT

Trigger:

cancel request emitted.

Actions:

Mark leaf as cancel-pending risk, not as dead.
Reduce confidence in local leaves estimate.
Block residual-based urgency jumps that assume immediate removal.

PENDING_CANCEL_EXPOSED

Trigger:

Pending Cancel acknowledged or dwell exceeds threshold.

Actions:

Treat leaf as economically live until proven otherwise.
Keep hedge logic in conservative mode.
Do not spawn replacement orders that would double-count the leaf unless explicit branch logic allows it.

POST_CANCEL_FILL_ACTIVE

Trigger:

one or more fills arrive after cancel request time.

Actions:

Recompute residual from authoritative cum/leaves data.
Suppress panic cancellation of sibling leaves unless parent risk truly changed.
Raise state-integrity telemetry.

CANCEL_REJECT_RECOVERY

Trigger:

OrderCancelReject (35=9) received.

Actions:

Treat reject as state-bearing.
Re-read OrdStatus, OrigClOrdID, ClOrdID, and reason code.
Reconcile whether original leaf is still working, partially filled, or terminal.
Freeze aggressive catch-up until venue-confirmed leaves are rebuilt.

FINALIZED

Trigger:

authoritative terminal cancel or authoritative leaves reconciliation.

Actions:

Resume normal routing.
Attribute realized cost to cancel-race bucket for TCA.

SAFE_CONTAIN

Trigger:

residual uncertainty or reject rate exceeds threshold.

Actions:

cap participation,
stop cross-venue catch-up bursts,
prefer fewer, larger state-confirmed actions,
route only after authoritative state convergence.

Control rules that actually help

1) Model cancel finality as probabilistic

The order is not economically dead at cancel-send time. Weight residual confidence by cancel-path latency and recent reject behavior.

2) Separate intent state from venue state

Local intent: “I want this gone.” Venue truth: “It may still fill.” Never let intent overwrite venue-confirmed leaves.

3) Treat Cancel Reject as a state update, not just an exception

35=9 must feed the same authoritative state machine as 35=8, especially through OrdStatus(39).

4) Penalize urgency when residual uncertainty is high

If residual_uncertainty_qty is elevated, do not let the scheduler become maximally aggressive. Uncertain residuals should reduce confidence, not amplify urgency.

5) Avoid immediate same-price re-entry after broad safety cancels

If the leaf was probably still economically useful, canceling and instantly re-entering just converts uncertainty into queue loss.

6) Add a hedge holdback window after cancel races

Small, bounded holdback windows can prevent hedge overshoot when post-cancel fills are still plausible.

7) Attribute cancel-race slippage separately in TCA

Otherwise the desk misdiagnoses the cost as generic spread/impact or generic venue toxicity.

TCA / KPI layer

Track these explicitly:

PCRR — Post-Cancel Fill Rate
- fraction of cancel requests followed by at least one fill before terminal resolution
PCFQ — Post-Cancel Fill Quantity
- fill quantity received after cancel request timestamp
CRBR — Cancel Reject Burst Rate
- clustered reject intensity per venue/session window
PCDW95 — Pending Cancel Dwell p95
- tail time spent in pending-cancel state
RUG — Residual Uncertainty Gap
- |R_obs - R_reconciled| / parent_qty
QRL — Queue Reset Loss
- estimated bps lost from cancel→reenter patterns that were avoidable
PRM5 — Post-Reject Markout 5s
- markout after cancel reject recovery windows
OCU — Overhedge Cleanup Units
- hedge/unwind quantity attributable to cancel-race ambiguity

These should be segmented by:

venue,
symbol liquidity bucket,
child tactic,
time-to-deadline bucket,
and session-health regime.

Validation approach

Backtest / replay questions

After a cancel request, how often did more fill quantity arrive before terminal cancel?
If those post-cancel fills were hidden from the controller, how much extra catch-up flow would it have sent?
How much queue loss came from cancel→reenter patterns where the original leaf would have completed acceptably?
What fraction of tail slippage clusters are preceded by pending-cancel dwell or cancel rejects?

Shadow-mode checks

Compare production policy vs a counterfactual that waits for authoritative cancel finality.
Compare against a policy that preserves resting leaves unless uncertainty crosses a strict threshold.
Estimate tail cost change, not just mean fill price change.

Failure-injection drills

Simulate:

delayed cancel acknowledgements,
post-cancel fills,
cancel rejects with Too late to cancel,
reject bursts during fast markets,
and order-entry vs drop-copy disagreement.

If the strategy only survives the clean “cancel then canceled” path, it is not production-ready.

Anti-patterns

Binary leaf state: working vs dead with no pending/ambiguous mode
Intent-overwrites-truth: local cancel intent zeros out venue leaves
Reject-as-log-line: cancel reject is logged but not folded into state
Immediate replacement reflex: new child launched before race window resolves
No TCA bucket: all damage blamed on spread or venue toxicity
No hedge dampener: hedge reacts faster than order-state certainty

Implementation sketch

A robust controller usually needs:

authoritative per-leaf state machine
- NEW -> PARTIAL -> PENDING_CANCEL -> {PARTIAL, CANCELED, FILLED}
- with 35=9 allowed to move the leaf back into a live/partial regime
residual-confidence score
- derived from pending-cancel dwell, reject rate, and cross-channel agreement
ambiguity-aware scheduler
- caps urgency and replacement flow under high residual uncertainty
cancel-race ledger
- stores request time, terminal time, post-cancel fill qty, reject reason, and recovery action
TCA attribution hooks
- assign slippage to cancel-race bucket rather than generic impact bucket

Bottom line

Cancel logic is not a clerical detail.

A live order can still trade while your cancel is “in progress,” and FIX explicitly models that world. If your execution stack treats pending cancel like economic finality, or treats cancel reject like a mere error instead of a state-bearing event, you quietly turn state ambiguity into slippage.

The fix is not “cancel faster.”

The fix is to treat cancel finality as uncertain, model the race window explicitly, and stop converting protocol semantics into fake certainty.

References

FIX Trading Community — Order State Changes (OrdStatus precedence; cancel scenarios including executions while cancel is active): https://www.fixtrading.org/online-specification/order-state-changes/
OnixS FIX 4.2 Dictionary — D4: Cancel request issued for a part-filled order — executions occur whilst cancel request is active: https://www.onixs.biz/fix-dictionary/4.2/app_d4.html
OnixS FIX 4.4 Dictionary — Order Cancel Reject <9> message (reject semantics and fields): https://www.onixs.biz/fix-dictionary/4.4/msgtype_9_9.html
B2BITS FIX Dictionary — Execution Report (MsgType=8) (OrdStatus precedence and Pending Cancel meaning): https://www.b2bits.com/fixopaedia/fixdic50/message_Execution_Report_8_.html