Transfer-Entropy Lead-Lag Toxicity Routing Slippage Playbook

Date: 2026-03-02 Category: finance / execution research

Why this model

In fragmented markets, slippage is often caused less by "not enough displayed depth" and more by information arriving unevenly across venues.

Typical router logic (best price + fee + queue estimate) misses a key dynamic:

aggressive flow becomes toxic on Venue A,
then that toxicity propagates to Venue B/C with a short lag,
and your child order lands exactly in that lag window.

A lead-lag toxicity model based on transfer entropy (TE) helps detect directional information flow between venues and route away from "about-to-turn-toxic" books.

Core idea

Build a real-time graph where each directed edge (i \to j) measures how much recent flow/toxicity on venue (i) improves prediction of venue (j)'s next-state toxicity beyond (j)'s own history.

If (TE_{i\to j}) is high and recent signals on (i) deteriorate, reduce passive exposure on (j) before the shock fully transmits.

Feature schema

At 100ms–1s slices (symbol-level), collect per venue (v):

microprice return (\Delta m^v_t)
imbalance change (\Delta I^v_t)
cancel burst proxy (C^v_t)
trade sign pressure (Q^v_t)
short-horizon markout (M^v_{t+\tau})
spread/depth resilience (R^v_t)

Define a venue toxicity state (T^v_t) (continuous score or discrete bins LOW/MID/HIGH):

[ T^v_t = w_1 \cdot z(M^v_{t+\tau}) + w_2 \cdot z(C^v_t) + w_3 \cdot z(|Q^v_t|) + w_4 \cdot z(\text{spread jump}) ]

Transfer-entropy layer

For directed pair ((i,j)):

[ TE_{i\to j}(L) = I\left(T^i_{t-L:t-1};, T^j_t ,\middle|, T^j_{t-L:t-1}\right) ]

where (I(\cdot)) is conditional mutual information and (L) is lag window.

Practical estimation options:

Discrete binning + Miller-Madow correction (fast, robust)
kNN entropy estimators (more flexible, noisier)
Logistic surrogate: compare predictive cross-entropy with/without source-venue history

Use rolling estimation (e.g., 15–30 min windows) with exponential decay to keep edge weights adaptive.

Lead-lag shock score

For destination venue (j), define propagated shock risk:

[ S^j_t = \sum_{i\neq j} \hat{TE}_{i\to j,t} \cdot \psi\left(T^i_t - \bar T^i_t\right) ]

(\hat{TE}_{i\to j,t}): normalized edge weight at time (t)
(\psi): convex activation for sudden toxicity jumps

Then combine with local conditions:

[ \text{Risk}^j_t = a,S^j_t + b,T^j_t + c,\text{QueueFragility}^j_t ]

This creates a forward-looking risk score: "venue (j) is currently okay, but incoming toxicity probability is high."

Router policy mapping

Translate (\text{Risk}^j_t) into route controls:

Passive quote size cap per venue
Max resting time before cancel
IOC fallback threshold
Venue exclusion cooldown when risk exceeds hard limit

Example policy:

Risk < 0.35: normal passive posting
0.35–0.60: halve passive slice, tighten cancel timeout
0.60–0.80: no new passive, midpoint/IOC only if urgency high
0.80: temporary venue quarantine (except emergency completion lane)

Use hysteresis bands to prevent route thrashing.

Training & validation design

1) Offline research track

Construct synchronized multi-venue tapes
Estimate TE graph by session bucket (open/mid/close)
Verify directional stability and lag structure

2) Counterfactual simulation

Compare:

fee+spread baseline router
toxicity-only router (no TE graph)
TE-augmented router (proposed)

Metrics:

mean slippage (bps)
p95/p99 slippage
adverse markout tails
fill ratio / completion risk
cancel-to-fill ratio (operational load)

3) Live shadow mode

Run TE risk engine without control for 2+ weeks:

log suggested actions vs realized outcomes
evaluate calibration of risk deciles
measure incremental value vs existing toxicity gate

Production guardrails

Data quality hard checks
- venue clock skew bounds
- stale book detection
- crossed/locked quote sanitization
Estimator stability checks
- minimum sample requirement per rolling window
- edge-weight shrinkage to prior during sparse flow
- cap total incoming TE mass to avoid runaway scoring
Fail-safe behavior
- if TE engine unhealthy, revert to baseline toxicity router
- keep emergency completion path independent of TE layer

Common failure modes

Spurious causality from common shocks
- Mitigate with conditioning on market-wide factors and auction/event flags.
Non-stationary lag structure
- Re-estimate lag buckets by session and volatility regime.
Overreaction in thin names
- Use stronger shrinkage + coarser bins + higher action thresholds.
Operational churn (too many cancels)
- Penalize cancel intensity directly in policy objective.

Minimal implementation checklist

Define venue toxicity state and normalization
Implement rolling TE estimator + regularization
Build directed venue graph service (real-time)
Derive propagated shock score and risk map
Integrate router action ladder with hysteresis
Run walk-forward + stress-session replay
Deploy shadow -> capped-control -> full-control rollout

One-line takeaway

Slippage in fragmented markets is often a propagation problem; a transfer-entropy lead-lag graph lets the router act on where toxicity is going next, not just where it already is.