Distilled Deep-LOB → Low-Latency Slippage Model Playbook
Date: 2026-03-15
Category: research
Audience: small quant teams that want richer slippage forecasts without blowing latency budgets
Why this research matters
Many teams hit the same wall:
- Simple models (spread + volatility + participation) are fast but miss microstructure state.
- Deep LOB models capture structure better, but inference is often too slow/fragile for live routing decisions.
A practical compromise is a two-speed model stack:
- a high-capacity teacher model (offline / nearline),
- a compact student model (online, strict latency SLO),
- a continuous calibration loop linking both.
This gives better slippage surfaces than naive heuristics while keeping online serving operationally safe.
Core architecture (operator view)
1) Teacher (offline, high capacity)
Use richer microstructure context:
- LOB depth tensors (e.g., top 10 levels bid/ask)
- order-flow imbalance (OFI)
- queue depletion/refill rates
- latency path features (decision→send→ack)
- regime tags (auction proximity, news windows, spread regime)
Typical outputs (multi-task):
E[IS_bps | state, action]Q90/Q95(IS_bps | state, action)- fill probability and expected completion time
- short-horizon markout
Teacher can be a CNN/LSTM/Transformer family as long as it is causal and evaluated on strict point-in-time data.
2) Student (online, low latency)
Train a lightweight model to mimic teacher outputs + real realized outcomes:
- small GBDT / monotonic GAM / compact MLP
- feature count capped (e.g., 30–80 engineered features)
- inference target: typically sub-millisecond end-to-end, often <100–300µs model time
Student should optimize for stable serving, not leaderboard accuracy.
3) Distillation loop (bridge)
- Replay historical sessions and generate teacher labels on action/state grid.
- Train student on:
- teacher targets (smooth structure)
- realized execution labels (ground truth discipline)
- Calibrate tails (isotonic/quantile recalibration) per liquidity regime.
Label design that actually survives production
Use labels aligned to real decisions:
- Child-order IS vs arrival and decision benchmarks
- Parent-order cumulative IS
- Opportunity cost for unfilled slices
- Censoring flags (
no-fill,timeout,cancel-replace)
Do not drop censored paths: deleting no-fill branches creates optimistic bias and pushes live behavior toward panic crossing.
Objective stack (recommended)
Instead of a single MSE objective, use a portfolio:
- mean loss for central tendency
- pinball loss for q90/q95 tails
- binary/log-loss for completion within deadline
- optional ranking loss for tactic selection (maker-first vs taker-first)
Then derive a control metric for routing:
Score(action) = E[IS] + λ_tail * Q95(IS) + λ_deadline * P(miss_deadline)
Choose action with lowest score under hard risk constraints.
Data contract checklist (non-negotiable)
- event-time sequencing preserved (
decision,send,ack,fill,cancel) - venue/tactic IDs immutable through pipeline
- point-in-time fees/rebates and lot rules
- explicit handling of corrections/busts and late prints
- deterministic train/eval manifests (data hash + feature hash + code hash)
If these are weak, better architecture will not save the model.
Validation ladder for rollout
Stage A — Offline replay
Gate on:
- q50/q90/q95 error for IS
- calibration error of tail quantiles
- completion and deadline prediction quality
- regime robustness (open, lunch lull, close, event windows)
Stage B — Shadow mode (paper routing)
- run student live in parallel with current policy
- compare predicted vs realized slippage by tactic/venue
- monitor drift in feature population and residuals
Stage C — Canary deployment
- allocate small notional slice
- enforce hard kill-switch thresholds:
- tail slippage breach
- reject/retry storms
- completion floor breach
Promote only after stable behavior across multiple market regimes.
Known failure modes
Teacher overfits a historical microstructure regime
Fix: rolling retrain windows + regime-balanced sampling.Student too compressed, loses action ranking
Fix: distill pairwise action preference in addition to scalar targets.Tail underestimation after fee/latency drift
Fix: online residual monitors + periodic tail recalibration.Feature leakage from non-causal joins
Fix: strict point-in-time feature store and time-travel tests.Ops complexity explosion
Fix: start with one symbol cluster + one venue class, then expand.
2-week practical build plan
Days 1-3
- finalize data contract and label schema
- baseline student model from existing features
Days 4-7
- train teacher on depth+flow tensors
- produce teacher action-surface labels on replay set
Days 8-10
- distill into student
- add quantile calibration by regime
Days 11-12
- shadow deployment + drift dashboard
Days 13-14
- small canary with hard rollback rules
- freeze v1 runbook and retraining cadence
Bottom line
You do not need to serve a giant deep model directly to benefit from deep microstructure learning.
A teacher-student slippage stack gives a pragmatic path:
- rich structure discovery offline,
- robust low-latency decisions online,
- explicit tail-risk control for live capital.
For small teams, this is often the highest signal-per-operational-risk route.
References (starting points)
Cont, Kukanov, Stoikov — The Price Impact of Order Book Events (JFEc 2014)
https://arxiv.org/abs/1011.6402Taranto et al. — Linear models for the impact of order flow on prices I (2016)
https://arxiv.org/abs/1602.02735Zhang, Zohren, Roberts — DeepLOB (IEEE TSP 2019)
https://arxiv.org/abs/1808.03668Bodor et al. — Deep Learning Meets Queue-Reactive (2025)
https://arxiv.org/abs/2501.08822Benzaquen, Eisler, Bouchaud — Trading Lightly: Cross-Impact and Optimal Portfolio Execution
https://ar5iv.labs.arxiv.org/html/1702.03838Donier, Bonart et al. — A million metaorder analysis of market impact on Bitcoin
https://ar5iv.labs.arxiv.org/html/1412.4503