Uncertainty-Decomposed Slippage Control Playbook (Epistemic vs Aleatoric)

2026-02-28 · finance

Uncertainty-Decomposed Slippage Control Playbook (Epistemic vs Aleatoric)

Date: 2026-02-28
Category: research (quant execution / slippage modeling)

Why this playbook

Many execution stacks predict expected slippage ((\mu)) and maybe one tail quantile. That is useful, but it blends two very different risks:

  1. Aleatoric uncertainty — market noise you cannot remove (microstructure randomness, queue race noise).
  2. Epistemic uncertainty — model ignorance (regime novelty, sparse context, covariate shift).

Operationally, these need different actions. If you treat both as one number, you either:

This playbook decomposes uncertainty and maps it to live execution controls.


Core objective

For each child-order decision at time (t), estimate:

Then drive a controller that distinguishes:


Practical modeling architecture

Use a two-layer setup.

Layer A — Base predictive ensemble

Train (M) heterogeneous models (e.g., LightGBM, CatBoost, linear baseline, shallow MLP) on the same target:

[ S_t = \text{realized child-order slippage in bps} ]

Each model outputs:

Ensemble mean:

[ \hat{\mu}t = \frac{1}{M}\sum{m=1}^{M} \hat{\mu}_{m,t} ]

Epistemic proxy (between-model spread):

[ \hat{\sigma}^{(e)}t = \sqrt{\frac{1}{M-1}\sum{m=1}^{M}\left(\hat{\mu}_{m,t}-\hat{\mu}_t\right)^2} ]

Layer B — Residual/noise model

Fit a second model on absolute residuals from recent production windows:

[ r_t = |S_t - \hat{\mu}_t| ]

Predict (\hat{r}_t) using the same context features plus queue/latency stress features. Treat (\hat{\sigma}^{(a)}_t \propto \hat{r}_t) as aleatoric scale.

This separates:


Feature blocks that improve decomposition

Use feature families with explicit intent:

  1. Execution intent
    • side, urgency, participation target, child size / ADV slice, remaining horizon.
  2. Book state
    • spread, top-k depth imbalance, microprice drift, queue position percentile.
  3. Flow toxicity
    • short-horizon OFI, cancel/trade divergence, markout pressure proxy.
  4. Infra timing
    • decision→gateway→ACK latencies, jitter z-score, throttle utilization.
  5. Novelty / OOD indicators (epistemic helpers)
    • distance-to-training manifold (kNN distance, Mahalanobis, leaf-frequency rarity),
    • regime labels unseen in recent training windows,
    • feature missingness pattern drift.

OOD features are often the highest leverage for epistemic alarms.


Calibration: make uncertainty actionable

Raw uncertainty numbers are rarely calibrated. Add explicit calibration.

1) Quantile calibration

On rolling validation windows, enforce empirical coverage:

Use isotonic or monotone spline recalibration per liquidity regime.

2) Epistemic reliability curve

Bucket (\hat{\sigma}^{(e)}) deciles, then measure future error inflation:

[ \rho_k = \frac{\mathbb{E}[|S-\hat{\mu}|\mid \hat{\sigma}^{(e)}\in B_k]}{\mathbb{E}[|S-\hat{\mu}|\mid \hat{\sigma}^{(e)}\in B_1]} ]

You want monotonic (\rho_k). If not monotonic, your epistemic proxy is noisy and needs feature/OOD work.

3) Drift-conditioned recalibration

Maintain separate calibrators for:

using microstructure stress labels. One global calibrator usually under-covers in stress.


Decision policy: uncertainty-aware controller

Define normalized pressure scores:

[ U^{(a)}_t = \frac{\hat{\sigma}^{(a)}_t}{\text{median}({\hat{\sigma}^{(a)}})+\epsilon}, \quad U^{(e)}_t = \frac{\hat{\sigma}^{(e)}_t}{\text{median}({\hat{\sigma}^{(e)}})+\epsilon} ]

Tail budget burn:

[ B_t = \frac{\sum_{i\le t} \max(0, S_i - q_{0.9,i})}{\text{daily tail budget (bps)}} ]

State machine

  1. NORMAL

    • condition: (U^{(e)}<1.3), (B_t<0.5)
    • action: baseline tactic mix.
  2. NOISY-KNOWN (aleatoric-dominant)

    • condition: (U^{(a)}\uparrow), (U^{(e)}) moderate
    • action: smaller clips, slightly lower POV, keep participation continuity.
  3. UNKNOWN-RISK (epistemic-dominant)

    • condition: (U^{(e)}\ge 1.8) or OOD flag high
    • action: switch to conservative fallback policy (historically robust tactic set), tighten max aggression, increase passive bias unless deadline breach risk dominates.
  4. SAFE

    • condition: (B_t\ge 1.0) + (high (U^{(e)}) or severe stress)
    • action: hard participation cap, venue whitelist, optional temporary pause for non-urgent flow.

Use hysteresis and minimum dwell times to avoid state flapping.


Counterfactual guardrail before policy promotion

Before promoting a new policy/model to production:

  1. Replay recent tape with off-policy estimators (DR/SNIPS).
  2. Evaluate by tail-first metrics:
    • p95 slippage,
    • CVaR95,
    • budget-breach frequency,
    • underfill at horizon.
  3. Require challenger to satisfy:
    • no worse p95/CVaR95 in stress,
    • no material increase in underfill (> threshold),
    • uncertainty calibration not degraded (coverage error bound).

Mean improvement alone is not sufficient.


Monitoring dashboard (minimum viable)

Track these in real time and EOD:

  1. Cost metrics
    • mean/p90/p95 slippage, CVaR95.
  2. Calibration metrics
    • p90/p95 empirical coverage gap,
    • PIT histogram sanity.
  3. Uncertainty decomposition
    • median/90p (\hat{\sigma}^{(a)}), (\hat{\sigma}^{(e)}),
    • share of UNKNOWN-RISK time.
  4. Controller behavior
    • state occupancy, transitions/hour, dwell time.
  5. Execution business metrics
    • completion ratio, schedule delay, reject/cancel anomalies.

Alert examples:


Failure modes and fixes

  1. Epistemic over-triggering in sparse symbols
    • fix: hierarchical pooling by liquidity bucket + symbol embeddings.
  2. Aleatoric model learning stale residual regimes
    • fix: shorter residual retrain cadence + stress-window oversampling.
  3. Great calibration in backtest, poor live coverage
    • fix: prequential (online) calibration checks, not static split only.
  4. Controller improves tail but kills fills
    • fix: dual objective guardrail (tail + completion), enforce minimum completion SLO.

Implementation blueprint (4-week cut)

Week 1

Week 2

Week 3

Week 4


Compact operator checklist

Before market open:

During session:

After close:


Bottom line

A single slippage forecast is not enough for live execution control. The critical upgrade is to separate:

That decomposition turns uncertainty from dashboard decoration into a concrete control surface for safer, higher-confidence execution.