Transfer-Entropy Lead-Lag Toxicity Routing Slippage Playbook
Date: 2026-03-02 Category: finance / execution research
Why this model
In fragmented markets, slippage is often caused less by "not enough displayed depth" and more by information arriving unevenly across venues.
Typical router logic (best price + fee + queue estimate) misses a key dynamic:
- aggressive flow becomes toxic on Venue A,
- then that toxicity propagates to Venue B/C with a short lag,
- and your child order lands exactly in that lag window.
A lead-lag toxicity model based on transfer entropy (TE) helps detect directional information flow between venues and route away from "about-to-turn-toxic" books.
Core idea
Build a real-time graph where each directed edge (i \to j) measures how much recent flow/toxicity on venue (i) improves prediction of venue (j)'s next-state toxicity beyond (j)'s own history.
If (TE_{i\to j}) is high and recent signals on (i) deteriorate, reduce passive exposure on (j) before the shock fully transmits.
Feature schema
At 100ms–1s slices (symbol-level), collect per venue (v):
- microprice return (\Delta m^v_t)
- imbalance change (\Delta I^v_t)
- cancel burst proxy (C^v_t)
- trade sign pressure (Q^v_t)
- short-horizon markout (M^v_{t+\tau})
- spread/depth resilience (R^v_t)
Define a venue toxicity state (T^v_t) (continuous score or discrete bins LOW/MID/HIGH):
[ T^v_t = w_1 \cdot z(M^v_{t+\tau}) + w_2 \cdot z(C^v_t) + w_3 \cdot z(|Q^v_t|) + w_4 \cdot z(\text{spread jump}) ]
Transfer-entropy layer
For directed pair ((i,j)):
[ TE_{i\to j}(L) = I\left(T^i_{t-L:t-1};, T^j_t ,\middle|, T^j_{t-L:t-1}\right) ]
where (I(\cdot)) is conditional mutual information and (L) is lag window.
Practical estimation options:
- Discrete binning + Miller-Madow correction (fast, robust)
- kNN entropy estimators (more flexible, noisier)
- Logistic surrogate: compare predictive cross-entropy with/without source-venue history
Use rolling estimation (e.g., 15–30 min windows) with exponential decay to keep edge weights adaptive.
Lead-lag shock score
For destination venue (j), define propagated shock risk:
[ S^j_t = \sum_{i\neq j} \hat{TE}_{i\to j,t} \cdot \psi\left(T^i_t - \bar T^i_t\right) ]
- (\hat{TE}_{i\to j,t}): normalized edge weight at time (t)
- (\psi): convex activation for sudden toxicity jumps
Then combine with local conditions:
[ \text{Risk}^j_t = a,S^j_t + b,T^j_t + c,\text{QueueFragility}^j_t ]
This creates a forward-looking risk score: "venue (j) is currently okay, but incoming toxicity probability is high."
Router policy mapping
Translate (\text{Risk}^j_t) into route controls:
- Passive quote size cap per venue
- Max resting time before cancel
- IOC fallback threshold
- Venue exclusion cooldown when risk exceeds hard limit
Example policy:
- Risk < 0.35: normal passive posting
- 0.35–0.60: halve passive slice, tighten cancel timeout
- 0.60–0.80: no new passive, midpoint/IOC only if urgency high
0.80: temporary venue quarantine (except emergency completion lane)
Use hysteresis bands to prevent route thrashing.
Training & validation design
1) Offline research track
- Construct synchronized multi-venue tapes
- Estimate TE graph by session bucket (open/mid/close)
- Verify directional stability and lag structure
2) Counterfactual simulation
Compare:
- fee+spread baseline router
- toxicity-only router (no TE graph)
- TE-augmented router (proposed)
Metrics:
- mean slippage (bps)
- p95/p99 slippage
- adverse markout tails
- fill ratio / completion risk
- cancel-to-fill ratio (operational load)
3) Live shadow mode
Run TE risk engine without control for 2+ weeks:
- log suggested actions vs realized outcomes
- evaluate calibration of risk deciles
- measure incremental value vs existing toxicity gate
Production guardrails
Data quality hard checks
- venue clock skew bounds
- stale book detection
- crossed/locked quote sanitization
Estimator stability checks
- minimum sample requirement per rolling window
- edge-weight shrinkage to prior during sparse flow
- cap total incoming TE mass to avoid runaway scoring
Fail-safe behavior
- if TE engine unhealthy, revert to baseline toxicity router
- keep emergency completion path independent of TE layer
Common failure modes
Spurious causality from common shocks
- Mitigate with conditioning on market-wide factors and auction/event flags.
Non-stationary lag structure
- Re-estimate lag buckets by session and volatility regime.
Overreaction in thin names
- Use stronger shrinkage + coarser bins + higher action thresholds.
Operational churn (too many cancels)
- Penalize cancel intensity directly in policy objective.
Minimal implementation checklist
- Define venue toxicity state and normalization
- Implement rolling TE estimator + regularization
- Build directed venue graph service (real-time)
- Derive propagated shock score and risk map
- Integrate router action ladder with hysteresis
- Run walk-forward + stress-session replay
- Deploy shadow -> capped-control -> full-control rollout
One-line takeaway
Slippage in fragmented markets is often a propagation problem; a transfer-entropy lead-lag graph lets the router act on where toxicity is going next, not just where it already is.