Drop-Copy Lag & Phantom Residual Slippage Playbook

Why this matters

Some execution stacks treat drop-copy as the source of truth for fills/positions during live slicing. When drop-copy lags (or arrives out of order), the router can hallucinate residual quantity that is already filled, then submit extra child orders.

That creates a hidden slippage loop:

perceived underfill ->
unnecessary catch-up aggression ->
overfill unwind / hedge churn ->
extra spread + impact + fee drag.

This is not classic market impact. It is a state-consistency tax.

Failure pattern (execution branch view)

At each decision point for parent order P:

Q_true: actual remaining quantity
Q_seen: remaining quantity inferred from delayed drop-copy
delta = Q_seen - Q_true

If delta > 0, strategy believes it is underfilled and over-trades.

Expected incremental cost from hallucinated residual:

E[ExtraCost] = E[ delta * (spread + impact + urgency_penalty) ] + E[unwind_cost]

The worst tails appear near deadlines (close, auction cutoff, schedule end), where urgency multipliers are steep.

Core metrics

1) Drop-Copy Delay Quantile (DCDQ)

DCDQ_p95 = p95( t_dropcopy_fill - t_true_fill )

Track by venue, symbol-liquidity bucket, and time-of-day regime.

2) Phantom Residual Ratio (PRR)

PRR = sum(max(Q_seen - Q_true, 0)) / sum(parent_qty)

Measures how much parent size was falsely considered "remaining".

3) Catch-up Overshoot Ratio (COR)

COR = sum(overfill_qty_due_to_state_lag) / sum(parent_qty)

Quantifies over-trading induced by stale execution state.

4) Reconciliation Shock Cost (RSC)

RSC = unwind_cost + forced_hedge_adjustment_cost

Use arrival-to-unwind and markout-adjusted accounting.

5) State Divergence Half-life (SDH)

Median time from divergence detection (Q_seen != Q_true) to restored consistency.

Modeling framework

A) Two-layer state model

Latent true state from exchange/order-gateway ACK + execution reports
Observed state from drop-copy stream used by strategy

Model P(divergence | latency, venue, load, message_rate, reconnect_state) and E[cost | divergence].

B) Tail-first objective

Do not optimize only mean bps.

Use:

J = E[slippage] + lambda1 * q95(slippage) + lambda2 * RSC

Because divergence events are sparse but expensive.

C) Deadline interaction term

Include time_to_deadline interaction:

cost ~ divergence * f(time_to_deadline)

This captures convex urgency tax when stale state persists late.

Execution controller (state machine)

STATE 1: SYNCED

Criteria:

DCDQ p95 below threshold
PRR low and stable

Policy:

Normal slicing logic
Full tactic set allowed

STATE 2: UNCERTAIN

Criteria:

DCDQ rising, intermittent divergence

Policy:

Reduce max child size
Increase minimum inter-dispatch gap
Require dual confirmation for aggressive catch-up

STATE 3: DEGRADED

Criteria:

Persistent divergence or reconnect burst
PRR/COR breach

Policy:

Freeze non-essential repricing churn
Prefer queue-preserving amend over cancel/replace
Cap urgency escalation slope
Enforce reconciliation window before large sweeps

STATE 4: SAFE_RECONCILE

Criteria:

Severe divergence + deadline stress

Policy:

Stop autonomous catch-up
Run deterministic reconciliation snapshot
Resume only after state lock restored

Practical guardrails

Confidence-weighted residual
- Use Q_effective = w * Q_seen + (1-w) * Q_conservative
- Decrease w as DCDQ/PRR worsens.
Dual-feed sanity checks
- Compare gateway-native execution view vs drop-copy view.
- Trigger degraded state if disagreement persists beyond tolerance.
Reconciliation cadence by regime
- Tighten cadence around open/close/auction windows.
Replayable incident packet
- Persist event timeline: send, ACK, fill, drop-copy arrival, decision.
- Needed for root-cause attribution and model recalibration.
No blind urgency under uncertainty
- If state confidence falls, urgency multiplier must saturate (hard cap).

Validation plan

Offline

Reconstruct true-vs-seen residual timelines from historical packets.
Counterfactual replay: baseline controller vs divergence-aware controller.
Compare mean, q95, q99 slippage and RSC.

Shadow live

Run divergence-aware policy in observe-only mode.
Emit would-have-acted decisions and projected cost deltas.

Canary

Start with low ADV symbols and low participation caps.
Auto-rollback if completion shortfall or q95 cost breaches policy bands.

Operator checklist

DCDQ monitored per venue/session bucket
PRR/COR/RSC dashboards with alert thresholds
State machine transitions audited and explainable
SAFE_RECONCILE tested in drills (not only incidents)
Weekly recalibration includes divergence features

Bottom line

When fill state is delayed, execution can pay a phantom residual tax that looks like random market noise. Treat drop-copy lag as a first-class risk factor, model it explicitly, and bind urgency to state confidence.

That is how you reduce tail slippage without sacrificing completion discipline.