Latency Arbitrage & Stale-Quote Defense Playbook (Practical)
Date: 2026-02-21
Category: knowledge
Focus: Prevent passive quotes from becoming free options for faster participants.
Why this matters
If your quote update loop is slower than market state changes, your posted liquidity can become stale for milliseconds that matter. In that window, adverse counterparties selectively hit your mispriced quotes. Over time this turns into a steady negative carry that looks like “bad luck” unless explicitly measured.
Failure mode in one line
Quote lifecycle latency > micro-regime half-life → stale quotes get picked off.
Where quote lifecycle latency includes:
- signal compute latency
- risk check latency
- gateway/serialization latency
- venue/network transit latency
- exchange ack latency
- cancel/replace queue delay
Observable symptoms
- Fill quality sharply worsens during volatility bursts despite stable spread settings.
- Passive fills show strongly negative short-horizon markout (e.g., +100ms, +500ms).
- Cancel-to-fill race loss rate rises when market jumps.
- Toxicity metrics (VPIN / OFI shock / trade-sign imbalance) spike before bad passive fills.
Core defense stack
1) Latency budget decomposition
Track p50/p90/p99 for each stage, not just end-to-end.
Recommended dashboard columns:
t_signalt_riskt_routet_exchange_ackt_cancel_ackt_replace_ackt_total
Rule: if p99 expands >2x baseline, auto-enter defensive quoting mode.
2) Staleness-aware quoting
Attach “freshness TTL” to every quote.
Example policy:
- normal regime TTL: 300ms
- volatile regime TTL: 80–120ms
- if quote age > TTL and no successful refresh ack, reduce size or pull quote.
This prevents “immortal stale quotes” during infra hiccups.
3) Regime gates (toxicity + volatility)
Before posting full size, require gates:
- short-window realized vol below threshold
- order-flow toxicity below threshold
- spread not collapsing below minimum edge
If gate fails:
- widen quote
- reduce size
- switch to midpoint peg off
- or temporarily quote one side only
4) Cancel/replace race management
Measure race outcomes explicitly:
race_lost = fill_time < cancel_ack_time
Control knobs:
- minimum quote lifetime by regime (too short can churn and still lose race)
- venue-level cancel efficiency score
- avoid over-refresh when queue priority value is high and toxicity low
5) Markout-driven feedback loop
For each fill, compute markout after 100ms / 500ms / 1s.
Attribution buckets:
- stale-state fill
- toxicity-spike fill
- normal passive fill
Use weekly recalibration:
- adjust TTL
- adjust regime thresholds
- adjust size ladder
Practical thresholds (starter values)
These are starting points; calibrate per venue/symbol.
- Defensive mode trigger:
- p99
t_total> 2.0 × 20-day median - or 1s realized vol > 95th percentile
- p99
- Auto size cut:
- 50% reduction when 500ms markout median < -0.6 tick
- One-sided quoting:
- enable when toxicity z-score > +2 and queue churn > threshold
Backtest/live gap warning
Most backtests assume immediate cancels and deterministic fills. Reality includes queue uncertainty + cancel races. To reduce simulation optimism:
- inject measured latency distributions (not constants)
- simulate cancel-race outcomes with empirical probability
- apply toxicity-conditional fill adjustments
Minimal implementation checklist
- Capture full quote lifecycle timestamps.
- Build per-venue p99 latency monitor with alerts.
- Add quote TTL + stale pull logic.
- Add defensive mode state machine (normal/defensive/panic).
- Compute markouts and weekly recalibrate thresholds.
One-sentence takeaway
In modern electronic markets, latency is not just speed—it is inventory risk and adverse-selection risk compressed into milliseconds.