Venue Fragmentation & SOR Slippage Playbook (Practical)
Date: 2026-02-22 17:04 KST
Category: research
Scope: Production execution for fragmented equity/crypto venues
Why this matters
In fragmented markets, "best price" is often a trap when you ignore:
- queue quality (you arrive late),
- hidden adverse selection (toxic flow pockets),
- fees/rebates by venue/tier,
- cancel/replace latency and reject rates,
- short-lived quote staleness.
A naive SOR that only chases top-of-book can increase realized slippage despite better displayed prices.
1) Cost model: choose by expected net fill quality
For each venue v, score the next child order by:
ExpectedCost_v = SpreadCapture_v + Impact_v + ToxicityPenalty_v + LatencyPenalty_v + RejectPenalty_v + FeeNet_v
Where:
SpreadCapture_v: expected half-spread earned/lost from passive/active path.Impact_v: short-horizon impact from participation vs venue depth.ToxicityPenalty_v: markout-derived penalty (e.g., 1s/5s adverse move after fill).LatencyPenalty_v: expected drift during routing+ack latency.RejectPenalty_v: expected retry cost from reject/cancel-fail probability.FeeNet_v: fee minus rebate, normalized to bps.
Route to venues minimizing ExpectedCost_v, not just best displayed quote.
2) Minimal feature set per venue (live)
Maintain rolling features (EWMA + quantiles):
- Fill ratio by order type/size bucket
- 1s and 5s markout (toxicity proxy)
- Ack latency p50/p95, cancel success rate
- Reject code distribution
- Realized net fee/rebate bps
- Queue depletion speed near top levels
If you can only keep three features, keep:
- markout, 2. fill ratio, 3. latency p95.
3) Regime-aware routing states
Use a lightweight state machine:
State A: Normal
- Multi-venue distribution enabled
- Passive-first when expected queue survival is acceptable
State B: Stress
Trigger: volatility or spread expansion above threshold, or venue latency p95 breach.
- Reduce venue fanout
- Tighten toxicity cap
- Raise minimum expected edge before passive posting
State C: Shock
Trigger: reject spike, data lag, crossed/stale quote anomalies.
- Kill risky venues temporarily
- Route to "trusted core venues" only
- Smaller child slices, shorter quote TTL
Recovery with hysteresis (don’t flap): require stable metrics for N windows before demotion.
4) Venue health score (for hard gating)
Define:
Health_v = w1*LatencyScore + w2*RejectScore + w3*ToxicityScore + w4*DataFreshnessScore
Hard rules:
- If
Health_v < H_min→ venue quarantined forT_quarantine. - If market-wide shock + venue unhealthy → no new passive orders there.
This prevents one broken venue from polluting portfolio execution.
5) Calibration loop (weekly)
- Recompute model coefficients per liquidity bucket.
- Compare predicted vs realized per-venue shortfall.
- Track calibration drift (MAE/p95 error).
- Auto-reduce confidence weight on drifting venues.
- Review top 10 worst routing decisions manually.
Don’t chase average error only—watch p95/p99 misses.
6) Practical anti-footgun checklist
- Fee/rebate table versioned and timestamped
- Reject code mapping normalized across venues
- Data freshness guard before routing decisions
- Passive order TTL capped under stress
- Retry budget capped (no infinite reroute loops)
- Venue quarantine list persisted/recoverable
- Per-venue kill switch tested in simulation
7) KPIs that actually matter
Primary:
- Realized implementation shortfall (bps), p50/p95
- Markout after fill (1s/5s)
- Opportunity loss from underfill
Secondary:
- Fill ratio, cancel success, reject rate
- Net fee/rebate contribution
A good SOR is one that lowers tail slippage without exploding opportunity loss.
8) Start simple (MVP path)
Week 1:
- Add venue-level markout + latency + reject stats
- Introduce binary unhealthy gate
Week 2:
- Add net-fee normalization and simple expected-cost ranking
Week 3:
- Add state machine (Normal/Stress/Shock) + hysteresis
Week 4:
- Run weekly calibration and top-loss case review
Compounding edge comes from calibration discipline, not a giant first model.