Latency Arbitrage & Stale-Quote Defense Playbook (Practical)

Date: 2026-02-21
Category: knowledge
Focus: Prevent passive quotes from becoming free options for faster participants.

Why this matters

If your quote update loop is slower than market state changes, your posted liquidity can become stale for milliseconds that matter. In that window, adverse counterparties selectively hit your mispriced quotes. Over time this turns into a steady negative carry that looks like “bad luck” unless explicitly measured.

Failure mode in one line

Quote lifecycle latency > micro-regime half-life → stale quotes get picked off.

Where quote lifecycle latency includes:

signal compute latency
risk check latency
gateway/serialization latency
venue/network transit latency
exchange ack latency
cancel/replace queue delay

Observable symptoms

Fill quality sharply worsens during volatility bursts despite stable spread settings.
Passive fills show strongly negative short-horizon markout (e.g., +100ms, +500ms).
Cancel-to-fill race loss rate rises when market jumps.
Toxicity metrics (VPIN / OFI shock / trade-sign imbalance) spike before bad passive fills.

Core defense stack

1) Latency budget decomposition

Track p50/p90/p99 for each stage, not just end-to-end.

Recommended dashboard columns:

t_signal
t_risk
t_route
t_exchange_ack
t_cancel_ack
t_replace_ack
t_total

Rule: if p99 expands >2x baseline, auto-enter defensive quoting mode.

2) Staleness-aware quoting

Attach “freshness TTL” to every quote.

Example policy:

normal regime TTL: 300ms
volatile regime TTL: 80–120ms
if quote age > TTL and no successful refresh ack, reduce size or pull quote.

This prevents “immortal stale quotes” during infra hiccups.

3) Regime gates (toxicity + volatility)

Before posting full size, require gates:

short-window realized vol below threshold
order-flow toxicity below threshold
spread not collapsing below minimum edge

If gate fails:

widen quote
reduce size
switch to midpoint peg off
or temporarily quote one side only

4) Cancel/replace race management

Measure race outcomes explicitly:

race_lost = fill_time < cancel_ack_time

Control knobs:

minimum quote lifetime by regime (too short can churn and still lose race)
venue-level cancel efficiency score
avoid over-refresh when queue priority value is high and toxicity low

5) Markout-driven feedback loop

For each fill, compute markout after 100ms / 500ms / 1s.

Attribution buckets:

stale-state fill
toxicity-spike fill
normal passive fill

Use weekly recalibration:

adjust TTL
adjust regime thresholds
adjust size ladder

Practical thresholds (starter values)

These are starting points; calibrate per venue/symbol.

Defensive mode trigger:
- p99 t_total > 2.0 × 20-day median
- or 1s realized vol > 95th percentile
Auto size cut:
- 50% reduction when 500ms markout median < -0.6 tick
One-sided quoting:
- enable when toxicity z-score > +2 and queue churn > threshold

Backtest/live gap warning

Most backtests assume immediate cancels and deterministic fills. Reality includes queue uncertainty + cancel races. To reduce simulation optimism:

inject measured latency distributions (not constants)
simulate cancel-race outcomes with empirical probability
apply toxicity-conditional fill adjustments

Minimal implementation checklist

Capture full quote lifecycle timestamps.
Build per-venue p99 latency monitor with alerts.
Add quote TTL + stale pull logic.
Add defensive mode state machine (normal/defensive/panic).
Compute markouts and weekly recalibrate thresholds.

One-sentence takeaway

In modern electronic markets, latency is not just speed—it is inventory risk and adverse-selection risk compressed into milliseconds.