Slippage CUSUM Change-Point + Burn-Rate Controller Playbook
Date: 2026-02-26
Category: Research (Execution / Slippage Modeling)
Scope: Intraday single-name and basket execution (KR-focused, portable)
Why this model exists
Most execution stacks still monitor slippage with slow averages:
- rolling 15m mean IS,
- end-of-order postmortems,
- dashboard alarms after damage is done.
That is operationally too late.
This playbook treats slippage control like SRE incident detection:
- Build a real-time expected slippage baseline,
- Track residual error,
- Run change-point detection (CUSUM/Page-Hinkley),
- Trigger a burn-rate controller on the remaining slippage budget.
Core idea: donβt just estimate cost level. Detect when the process has shifted regime right now and react before tail loss compounds.
Problem setup
Let parent order start at time (t_0), horizon (H), benchmark (B) (arrival or decision).
Observed incremental slippage (bps) at decision tick (t):
[ y_t = \text{slippage}_{t\rightarrow t+\Delta} ]
Predict expected slippage using context (x_t):
[ \hat{y}_t = f(x_t) ]
Residual:
[ r_t = y_t - \hat{y}_t ]
If (r_t) becomes persistently positive, execution is paying an abnormal tax not explained by current modeled context.
Stage 1 β Baseline model (residualization)
Use a fast model for (\hat{y}_t):
- LightGBM/CatBoost (tabular microstructure + own-exec features), or
- linear + interactions for lower latency and easier diagnostics.
Suggested features:
- spread, microprice drift, queue imbalance,
- short-horizon volatility,
- participation rate and remaining urgency,
- queue age / cancel-replace churn,
- session bucket (open/close/lunch), venue state (KRX/NXT/VI).
Goal is not perfection. Goal is to produce a stable residual stream where shifts are detectable.
Stage 2 β Online change-point detection
A) One-sided CUSUM (upward slippage shift)
Define standardized residual:
[ z_t = \frac{r_t - \mu_0}{\sigma_0 + \epsilon} ]
with (\mu_0\approx 0), (\sigma_0) estimated by robust rolling window.
CUSUM statistic:
[ S_t = \max{0, S_{t-1} + z_t - k} ]
- (k): slack/drift allowance (reference value),
- alert when (S_t > h) (decision threshold).
Interpretation:
- random noise resets near zero,
- persistent positive residuals accumulate and breach quickly.
B) Page-Hinkley variant (drift-sensitive)
For non-stationary sessions, Page-Hinkley can reduce false positives:
[ PH_t = PH_{t-1} + (r_t - \bar{r}_t - \delta) ]
Track minimum (m_t = \min_{i\le t} PH_i), alert if:
[ PH_t - m_t > \lambda ]
Use CUSUM as primary, Page-Hinkley as confirmation in noisy symbols.
Stage 3 β Slippage budget burn-rate control
Let total allowed slippage budget for parent order be (B_{tot}) (bps-weighted notional). Remaining budget at (t):
[ B_t = B_{tot} - \sum_{i=t_0}^{t} y_i w_i ]
where (w_i) is fill/notional weight.
Define projected burn rate over short horizon (\tau):
[ \rho_t = \frac{\mathbb{E}[\sum_{j=t}^{t+\tau} y_j w_j \mid x_t, S_t]}{\max(B_t, \epsilon)} ]
Controller tiers:
- GREEN: (\rho_t < 0.35)
- YELLOW: (0.35 \le \rho_t < 0.7)
- ORANGE: (0.7 \le \rho_t < 1.0)
- RED: (\rho_t \ge 1.0) or CUSUM hard breach
This is analogous to SLO burn-rate alerts: not just βare we bad,β but βhow fast are we consuming error budget.β
Controller state machine
[ \text{NORMAL} \rightarrow \text{GUARDED} \rightarrow \text{DEFENSIVE} \rightarrow \text{KILL/SAFE} ]
NORMAL
- passive-first,
- standard participation caps,
- normal quote dwell.
GUARDED (soft CUSUM alert or YELLOW burn)
- reduce passive dwell time,
- tighten stale-quote cancellation thresholds,
- lower max child order size,
- increase reprice cadence.
DEFENSIVE (CUSUM confirmed or ORANGE burn)
- cap passive exposure,
- prefer certainty over queue vanity,
- temporary venue quarantine if regime appears venue-local,
- activate tail-risk cap (q95 slippage guard).
KILL/SAFE (RED burn or repeated hard breaches)
- halt aggressive escalation,
- force controlled unwind protocol,
- notify operator with feature snapshot + detector trace.
Add hysteresis so controller does not flap:
- enter DEFENSIVE on 2 consecutive confirms,
- exit only after detector cool-down window.
Calibration process
1) Build residual reference
- Purged time-split by day/session,
- robustly estimate (\sigma_0) per symbol-time bucket,
- winsorize extreme prints for detector stability.
2) Tune detector thresholds
Optimize ((k, h)) for:
- low mean detection delay under injected shift,
- bounded false alarm rate by session bucket.
Inject synthetic shifts (e.g., +2/+4/+8 bps residual drift) into replay to map sensitivity curves.
3) Tune action policy
Backtest policy under counterfactual replay:
- objective: reduce q95/q99 IS,
- constraints: underfill penalty, completion SLA, turnover caps.
Evaluation scorecard
Primary:
- q95 implementation shortfall reduction vs baseline controller.
Secondary:
- detection delay (seconds) after true shift onset,
- false positive alerts per order/day,
- mode-switch rate (stability),
- completion rate and opportunity cost,
- tail conditional expectation (CVaR) of slippage.
Operational:
- percent alerts with actionable explanation,
- feature freshness SLO compliance,
- fallback invocation rate.
Production guardrails
- Feature staleness kill-switch: if key features delayed, detector output invalidated and fallback policy engaged.
- Session-aware thresholds: open/close windows need separate ((k,h)).
- Venue segmentation: detect whether shift is global or venue-local before routing changes.
- Minimum sample floor: suppress detector on too-sparse tick windows.
- Explainability payload per alert:
- top residual drivers,
- current CUSUM value, burn-rate tier,
- recommended action and confidence.
Pseudocode
for t in decision_ticks:
x = build_features(t)
y_hat = baseline.predict(x)
y_obs = realized_incremental_slippage(t)
r = y_obs - y_hat
z = (r - mu0_bucket) / (sigma0_bucket + 1e-6)
S = max(0.0, S + z - k)
cusum_alert = (S > h)
burn_rate = forecast_burn_rate(state, x, remaining_budget)
state = transition(state, cusum_alert, burn_rate, hysteresis=True)
action = policy(state, market_state=x, remain_qty=remain_qty)
if state == "KILL_SAFE":
trigger_safe_unwind()
send(action)
Common failure modes
Detector on raw slippage (no residualization)
β reacts to normal context changes, not true regime shifts.Single threshold for all sessions
β open/close false alarms explode.No hysteresis
β mode flapping and unnecessary cancel churn.Ignoring underfill cost
β slippage improves on paper while execution quality degrades.Alert with no prescribed action
β operators receive noise, not control.
Next experiments
- Bayesian online change-point posterior as a second detector.
- Joint detector across slippage residual + fill-hazard residual.
- Portfolio-aware correlated change-point detection (basket contagion).
- Conformal wrapper for alert confidence guarantees.
One-line takeaway
Treat slippage like an error budget: CUSUM catches hidden regime shifts early, and burn-rate control converts that warning into concrete execution actions before tail loss compounds.