Execution Kill-Switch Ladder Playbook (Production-Oriented)
Date: 2026-02-22
Category: knowledge
Purpose: A practical framework for pausing or degrading execution before slippage and adverse selection compound into a regime-level loss.
Why This Exists
Most desks either:
- react too late (manual panic stop), or
- overreact too early (turn off edge during normal noise).
A kill-switch ladder solves this by introducing staged responses:
- slow down,
- constrain,
- route-shift,
- freeze.
This preserves optionality while controlling tail execution risk.
Core Signals (Minimal Set)
Use a small, robust set first:
- Realized Slippage vs Model (bps)
slip_gap = realized_bps - expected_bps
- Markout Drift (1m / 5m)
- adverse post-trade price movement
- Fill Quality Deterioration
- lower fill ratio at same participation
- Latency / Reject Spike
- venue/API instability proxy
- Spread + Volatility Joint Spike
- microstructure stress regime indicator
Rule of thumb: avoid 20+ features. Production reliability > feature count.
State Machine (Ladder)
Level 0 — Normal
- All signals within control bands
- Standard participation and routing
Level 1 — Caution
Trigger (example):
slip_gap > p75for 2 consecutive windows, or- markout turns adverse beyond threshold
Action:
- reduce participation by 15–25%
- increase minimum quote quality filters
- shorten validity horizon on stale quotes
Level 2 — Defensive
Trigger (example):
slip_gap > p90and spread×vol spike confirmed
Action:
- participation cut 40–60%
- disable most aggressive child-order types
- prioritize deeper-liquidity venues
- enforce stricter max order size per slice
Level 3 — Freeze (Kill)
Trigger (example):
- p95 breach + venue instability + adverse markout persistence
Action:
- stop new parent orders
- allow safe unwind-only behavior (if policy permits)
- page operator with state snapshot
Recovery should require both metric normalization and minimum cool-down time.
Calibration Procedure
- Build baseline by symbol bucket (liquidity tiers).
- Estimate quantile thresholds (p75/p90/p95) on rolling windows.
- Backtest ladder transitions with historical stress windows.
- Simulate false positives cost (missed alpha) vs avoided tail loss.
- Lock initial thresholds, then revise monthly.
Do not calibrate solely on calm regimes.
Governance & Anti-Footgun Rules
- No single-metric kill in isolation (unless hard infra failure).
- Hysteresis required: harder to recover than to downgrade.
- Operator override logged with reason and timestamp.
- Post-incident review mandatory after every Level 3.
- Keep runbook text in plain language, not only dashboard logic.
Practical Dashboard Spec
Must show at a glance:
- current ladder level
- top triggering metrics
- time spent in each level today
- estimated bps saved (counterfactual model)
- opportunity cost estimate
Without opportunity-cost view, teams overfit to safety and under-trade.
Implementation Checklist
- Define symbol liquidity tiers
- Build rolling quantile calculator
- Add real-time state machine with hysteresis
- Wire order throttles to ladder levels
- Add operator alert + override path
- Run stress replay drill (at least 3 historical events)
- Document recovery criteria and on-call playbook
Closing Note
A kill-switch ladder is not about fear.
It is about keeping execution adaptive under stress so the strategy survives long enough for edge to matter.