Execution Kill-Switch Ladder Playbook (Production-Oriented)

Date: 2026-02-22
Category: knowledge
Purpose: A practical framework for pausing or degrading execution before slippage and adverse selection compound into a regime-level loss.

Why This Exists

Most desks either:

react too late (manual panic stop), or
overreact too early (turn off edge during normal noise).

A kill-switch ladder solves this by introducing staged responses:

slow down,
constrain,
route-shift,
freeze.

This preserves optionality while controlling tail execution risk.

Core Signals (Minimal Set)

Use a small, robust set first:

Realized Slippage vs Model (bps)
- slip_gap = realized_bps - expected_bps
Markout Drift (1m / 5m)
- adverse post-trade price movement
Fill Quality Deterioration
- lower fill ratio at same participation
Latency / Reject Spike
- venue/API instability proxy
Spread + Volatility Joint Spike
- microstructure stress regime indicator

Rule of thumb: avoid 20+ features. Production reliability > feature count.

State Machine (Ladder)

Level 0 — Normal

All signals within control bands
Standard participation and routing

Level 1 — Caution

Trigger (example):

slip_gap > p75 for 2 consecutive windows, or
markout turns adverse beyond threshold

Action:

reduce participation by 15–25%
increase minimum quote quality filters
shorten validity horizon on stale quotes

Level 2 — Defensive

Trigger (example):

slip_gap > p90 and spread×vol spike confirmed

Action:

participation cut 40–60%
disable most aggressive child-order types
prioritize deeper-liquidity venues
enforce stricter max order size per slice

Level 3 — Freeze (Kill)

Trigger (example):

p95 breach + venue instability + adverse markout persistence

Action:

stop new parent orders
allow safe unwind-only behavior (if policy permits)
page operator with state snapshot

Recovery should require both metric normalization and minimum cool-down time.

Calibration Procedure

Build baseline by symbol bucket (liquidity tiers).
Estimate quantile thresholds (p75/p90/p95) on rolling windows.
Backtest ladder transitions with historical stress windows.
Simulate false positives cost (missed alpha) vs avoided tail loss.
Lock initial thresholds, then revise monthly.

Do not calibrate solely on calm regimes.

Governance & Anti-Footgun Rules

No single-metric kill in isolation (unless hard infra failure).
Hysteresis required: harder to recover than to downgrade.
Operator override logged with reason and timestamp.
Post-incident review mandatory after every Level 3.
Keep runbook text in plain language, not only dashboard logic.

Practical Dashboard Spec

Must show at a glance:

current ladder level
top triggering metrics
time spent in each level today
estimated bps saved (counterfactual model)
opportunity cost estimate

Without opportunity-cost view, teams overfit to safety and under-trade.

Implementation Checklist

Define symbol liquidity tiers
Build rolling quantile calculator
Add real-time state machine with hysteresis
Wire order throttles to ladder levels
Add operator alert + override path
Run stress replay drill (at least 3 historical events)
Document recovery criteria and on-call playbook

Closing Note

A kill-switch ladder is not about fear.
It is about keeping execution adaptive under stress so the strategy survives long enough for edge to matter.