Venue Fragmentation & SOR Slippage Playbook (Practical)

Date: 2026-02-22 17:04 KST
Category: research
Scope: Production execution for fragmented equity/crypto venues

Why this matters

In fragmented markets, "best price" is often a trap when you ignore:

queue quality (you arrive late),
hidden adverse selection (toxic flow pockets),
fees/rebates by venue/tier,
cancel/replace latency and reject rates,
short-lived quote staleness.

A naive SOR that only chases top-of-book can increase realized slippage despite better displayed prices.

1) Cost model: choose by expected net fill quality

For each venue v, score the next child order by:

ExpectedCost_v = SpreadCapture_v + Impact_v + ToxicityPenalty_v + LatencyPenalty_v + RejectPenalty_v + FeeNet_v

Where:

SpreadCapture_v: expected half-spread earned/lost from passive/active path.
Impact_v: short-horizon impact from participation vs venue depth.
ToxicityPenalty_v: markout-derived penalty (e.g., 1s/5s adverse move after fill).
LatencyPenalty_v: expected drift during routing+ack latency.
RejectPenalty_v: expected retry cost from reject/cancel-fail probability.
FeeNet_v: fee minus rebate, normalized to bps.

Route to venues minimizing ExpectedCost_v, not just best displayed quote.

2) Minimal feature set per venue (live)

Maintain rolling features (EWMA + quantiles):

Fill ratio by order type/size bucket
1s and 5s markout (toxicity proxy)
Ack latency p50/p95, cancel success rate
Reject code distribution
Realized net fee/rebate bps
Queue depletion speed near top levels

If you can only keep three features, keep:

markout, 2. fill ratio, 3. latency p95.

3) Regime-aware routing states

Use a lightweight state machine:

State A: Normal

Multi-venue distribution enabled
Passive-first when expected queue survival is acceptable

State B: Stress

Trigger: volatility or spread expansion above threshold, or venue latency p95 breach.

Reduce venue fanout
Tighten toxicity cap
Raise minimum expected edge before passive posting

State C: Shock

Trigger: reject spike, data lag, crossed/stale quote anomalies.

Kill risky venues temporarily
Route to "trusted core venues" only
Smaller child slices, shorter quote TTL

Recovery with hysteresis (don’t flap): require stable metrics for N windows before demotion.

4) Venue health score (for hard gating)

Define:

Health_v = w1*LatencyScore + w2*RejectScore + w3*ToxicityScore + w4*DataFreshnessScore

Hard rules:

If Health_v < H_min → venue quarantined for T_quarantine.
If market-wide shock + venue unhealthy → no new passive orders there.

This prevents one broken venue from polluting portfolio execution.

5) Calibration loop (weekly)

Recompute model coefficients per liquidity bucket.
Compare predicted vs realized per-venue shortfall.
Track calibration drift (MAE/p95 error).
Auto-reduce confidence weight on drifting venues.
Review top 10 worst routing decisions manually.

Don’t chase average error only—watch p95/p99 misses.

6) Practical anti-footgun checklist

Fee/rebate table versioned and timestamped
Reject code mapping normalized across venues
Data freshness guard before routing decisions
Passive order TTL capped under stress
Retry budget capped (no infinite reroute loops)
Venue quarantine list persisted/recoverable
Per-venue kill switch tested in simulation

7) KPIs that actually matter

Primary:

Realized implementation shortfall (bps), p50/p95
Markout after fill (1s/5s)
Opportunity loss from underfill

Secondary:

Fill ratio, cancel success, reject rate
Net fee/rebate contribution

A good SOR is one that lowers tail slippage without exploding opportunity loss.

8) Start simple (MVP path)

Week 1:

Add venue-level markout + latency + reject stats
Introduce binary unhealthy gate

Week 2:

Add net-fee normalization and simple expected-cost ranking

Week 3:

Add state machine (Normal/Stress/Shock) + hysteresis

Week 4:

Run weekly calibration and top-loss case review

Compounding edge comes from calibration discipline, not a giant first model.