Cross-Venue ACK Dispersion & Residual-Estimation Slippage Playbook

Why this matters

Most execution logic assumes your remaining parent size is known almost exactly in real time.

That assumption breaks in fragmented markets:

child orders are sent to multiple venues,
acknowledgements/fills arrive with different latencies,
cancel confirms arrive even later,
router state can be temporarily wrong about true residual.

When residual state is stale, the controller makes expensive choices:

Under-estimates residual → slows down too early, then pays late catch-up convexity.
Over-estimates residual → keeps pressing and risks overfill/self-competition churn.
Ping-pongs urgency on noisy ACK updates, paying spread and queue-reset tax.

Execution slippage is then driven by state-estimation lag, not only market impact.

Core failure mechanism

Let:

R_true(t): true remaining parent quantity,
R_est(t): router-estimated remaining quantity,
e_R(t) = R_est(t) - R_true(t): residual estimation error.

If ACK/fill/cancel events are delayed and heterogeneous across venues, e_R(t) becomes regime-dependent.

A simple dispersion driver:

D_ack(t) = p90(ack_latency_v,t) - p10(ack_latency_v,t)

for active venues v.

As D_ack grows, event-ordering ambiguity rises; residual control acts on stale state and branch losses become nonlinear near deadline windows.

Branch-cost decomposition

Expected cost can be decomposed as:

E[Cost] = C_base + C_underfill_catchup + C_overfill_unwind + C_churn + C_toxicity_miss

Where:

C_base: normal spread/impact/fees under synchronized state,
C_underfill_catchup: late aggressive completion when residual was under-estimated,
C_overfill_unwind: liquidation/hedge correction cost after over-execution,
C_churn: cancel/replace and queue-reset costs from oscillating urgency,
C_toxicity_miss: wrong passive/aggressive mix chosen due to stale residual confidence.

The hidden tax is convex because mis-estimation interacts with time pressure.

Metric stack

1) ACK Dispersion Spread (ADS)

ADS = p90_ack_latency - p10_ack_latency across active venues.

High ADS means asynchronous state truth.

2) Residual Error Band (REB)

Posterior interval width for residual estimate, e.g. q90(R_true|events) - q10(...).

Wide REB means controller should reduce brittle one-shot urgency shifts.

3) Deadline Catch-up Convexity (DCC)

Marginal expected cost per residual unit as time-to-deadline shrinks.

Rising DCC means residual uncertainty is becoming expensive fast.

4) Overfill Pressure Index (OPI)

Probability-weighted overfill risk under current in-flight child orders and cancel-lag.

High OPI means continued aggression can create unwind drag.

5) ACK Ordering Violation Rate (AOVR)

Rate of event sequences implying ambiguous or inverted local ordering for state updates.

Elevated AOVR means event ingest rules are too naive for current latency regime.

Control policy state machine

STATE 1 — SYNCHRONIZED

Conditions:

low ADS,
narrow REB,
low AOVR.

Policy:

standard residual-tracking controller,
normal venue allocation and urgency schedule.

STATE 2 — DESYNC_WATCH

Conditions:

ADS/REB rising,
occasional ordering violations,
moderate DCC.

Policy:

downweight fast urgency flips,
increase residual confidence threshold before large aggression jumps,
cap simultaneous in-flight child notional.

STATE 3 — DESYNC_ACTIVE

Conditions:

persistent high ADS/AOVR,
wide REB,
rising OPI or DCC.

Policy:

switch to uncertainty-aware residual posterior control,
prioritize venues with stable ACK behavior,
constrain cancel/reprice churn and avoid oscillatory response.

STATE 4 — SAFE

Conditions:

residual confidence collapse near deadline,
combined overfill + catch-up risk exceeds budget.

Policy:

halt non-essential micro-updates,
execute pre-approved completion ladder with bounded overfill risk,
prefer reliability over local spread optics.

Use asymmetric hysteresis to avoid state flapping.

Modeling pattern (production)

Residual posterior instead of point estimate
- maintain P(R_true | event stream, per-venue latency model).
- propagate uncertainty to tactic scoring, not just dashboarding.
Per-venue ACK latency regime model
- online estimates by venue × session segment × stress state,
- detect sudden latency skew shifts (not only mean drift).
Uncertainty-aware action objective
- optimize mean + q95 cost with explicit overfill and deadline penalties,
- penalize tactic sensitivity to residual uncertainty.
Event-order robust ingestion contract
- idempotent apply rules,
- monotonic sequence/causality guards where available,
- reconciliation path for delayed cancel/fill confirms.
Counterfactual replay with latency perturbations
- inject ACK-latency shocks and ordering permutations into historical runs,
- verify state transitions and tail-cost protection before production promotion.

Practical rollout checklist

Implement ADS/REB/OPI/AOVR dashboards by strategy and venue bucket.
Add DESYNC state machine with manual override + incident logging.
Limit in-flight exposure as a function of REB and DCC.
Canary in one strategy with strict q95 and overfill guardrails.
Weekly attribution: split slippage into synchronized vs desynchronized windows.

Bottom line

In fragmented execution, residual size is not a constant truth—it is a latency-conditioned belief state.

If your router acts on stale residual estimates as if they were exact, you quietly pay slippage through late catch-up, overfill unwind, and churn. Model ACK dispersion explicitly and let uncertainty drive control logic, especially when deadline convexity turns small state errors into expensive outcomes.