Order-Book Depth Truncation & Hidden-Gap Slippage Playbook

2026-04-11 · finance

Order-Book Depth Truncation & Hidden-Gap Slippage Playbook

Date: 2026-04-11
Category: research
Scope: How L1/L5/L10-style partial-depth feeds understate aggressive cost and distort slippage control

Why this matters

A lot of production execution stacks do not have the full book available at decision time.

Common realities:

That creates a specific failure mode:

the model thinks it knows the local supply curve, but it only knows the front porch.

When urgency rises or displayed depth thins, the order sweeps beyond the observed ladder, hits hidden gaps, and realized cost jumps several ticks beyond the model forecast. The result is not just noisier execution — it is a systematic tail-underestimation problem.


One-line intuition

Partial depth is fine until your order needs the first level you cannot see. After that, slippage becomes a hidden-gap problem, not a spread problem.


Failure mechanism (operator timeline)

  1. Training data stores only top 5 or top 10 levels.
  2. Cost model learns impact from visible cumulative depth and near-touch imbalance.
  3. Router sizes marketable or urgency-escalated child orders using that truncated view.
  4. In calm states, execution stays inside observed depth often enough that the model looks good on average.
  5. In thin or stressed states, the order consumes past the last visible level.
  6. True deeper-book spacing is wider than implied by the truncated ladder.
  7. Realized implementation shortfall jumps, especially in p95/p99 tails.
  8. Postmortem says “regime shift” or “sudden liquidity shock,” but part of the damage was simply book-information insufficiency.

Extend slippage decomposition with an information-level term

Let:

Then write:

[ IS = IS_{market} + IS_{impact} + IS_{timing} + IS_{fees} + \underbrace{IS_{trunc}}_{\text{depth truncation / hidden-gap tax}} ]

with

[ IS_{trunc}(q, N) \approx C_{true}(q) - C_{obs,N}(q). ]

This term is near zero when the order stays inside observed depth, but becomes convex once sweep size crosses the truncation boundary.


Core production metrics

1) Overflow Probability (OP)

Probability that decision size (q) requires liquidity beyond the last visible level (N):

[ OP_N(q) = P\big(L^*(q) > N \mid x_t\big) ]

where (L^*(q)) is the deepest level actually needed to fill size (q), and (x_t) is state (spread, depth, volatility, imbalance, event intensity, venue, time-of-day).

This is the first metric to operationalize. If you cannot estimate overflow risk, you cannot trust the truncated-book cost estimate.

2) Conditional Hidden-Gap Burden (HGB)

Extra cost once overflow happens:

[ HGB_N(q) = E\big[C_{true}(q) - C_{obs,N}(q) \mid L^*(q) > N, x_t\big]. ]

Think of this as “how bad it gets when the visible book runs out.”

3) Truncation Coverage Ratio (TCR)

Observed fraction of required executable depth:

[ TCR_N(q) = \frac{D_{obs,N}(q)}{D_{req}(q) + \epsilon} ]

where:

Low TCR means the model is making decisions with incomplete local supply information.

4) Information Sufficiency Curve (ISC)

Measure forecast quality as a function of depth level count:

[ ISC(N) = 1 - \frac{\mathrm{Loss}(N)}{\mathrm{Loss}(N_{full})} ]

for a cost or markout prediction loss of your choice.

This tells you whether L5 is “almost all the signal” or whether the step from L10 to L20 materially improves tail cost prediction.

5) Tail Overflow Loss Share (TOLS)

Fraction of p95/p99 cost attributable to truncation events:

[ TOLS = \frac{\sum IS_{trunc,i} \cdot 1{IS_i > q_{0.95}}}{\sum IS_i \cdot 1{IS_i > q_{0.95}}}. ]

If TOLS is large, your tail problem is partly a data-contract problem, not only a policy problem.


The key modeling split: two-stage truncation-aware cost model

A practical production model should separate:

  1. Will we overflow the visible book?
  2. If yes, how expensive is the hidden remainder?

Stage 1 — overflow classifier

Predict:

[ \hat{p}_{over} = P(L^*(q)>N \mid x_t). ]

Useful features:

Stage 2 — conditional overflow severity model

Given overflow, predict extra ticks/bps:

[ \widehat{HGB}N(q) = E[C{true}(q)-C_{obs,N}(q) \mid overflow, x_t]. ]

Good targets:

Combined estimator

[ E[C_{true}(q)\mid x_t] \approx C_{obs,N}(q) + \hat{p}_{over}(x_t)\cdot \widehat{HGB}_N(q, x_t). ]

This is much more robust than pretending truncated depth is the whole book.


Why average performance lies

A depth-truncated model can look good in backtests for three reasons:

  1. most child orders are small and stay inside visible depth,
  2. calm periods dominate sample count,
  3. mean loss hides overflow tails.

So a model may “win” on average while still failing exactly when:

This is why promotion gates should be tail-first, not mean-first.


Control states for a live router

GREEN — VISIBLE_BOOK_SUFFICIENT

Actions:

YELLOW — OVERFLOW_RISK_RISING

Actions:

ORANGE — HIDDEN_GAP_EXPOSED

Actions:

RED — INFORMATION-INSUFFICIENT EXECUTION

Actions:

Use hysteresis and minimum dwell time; otherwise the controller will flap around thin-book transitions.


Engineering patterns that help in the real world

1) Keep a shadow full-depth sample even if live decisions use truncated depth

You do not need full depth for every symbol at every microsecond to estimate truncation damage.

Even one of these helps:

Without any shadow truth, truncation tax becomes invisible.

2) Version the information level in your feature store

depth_levels=1, 5, 10, full should be explicit metadata.

Otherwise you silently mix:

That is a hidden train/serve mismatch.

3) Make child sizing conditional on information sufficiency

Do not use one global “safe marketable clip” if the visible depth level count changes by venue or symbol.

Safer rule:

[ q_{max}^{safe}(x_t) = \arg\max_q ; { OP_N(q\mid x_t) \le \alpha } ]

for an overflow-risk budget (\alpha).

4) Track deepest-level-reached as a first-class execution label

If your logs only keep fill price and notional, you miss the easiest proxy for truncation damage.

Track at least:

5) Separate “book is thin” from “book is unknown” in controls

Those are not the same problem.


Validation protocol

  1. Build paired datasets using:
    • truncated-depth features at level (N), and
    • deeper-book or realized sweep truth.
  2. Compare baseline cost model vs truncation-aware two-stage model.
  3. Segment by:
    • symbol liquidity tier,
    • venue,
    • urgency bucket,
    • spread regime,
    • time-of-day.
  4. Report:
    • mean IS,
    • p95/p99 IS,
    • overflow calibration error,
    • completion reliability,
    • false-safe rate (predicted safe but overflowed badly).
  5. Promote only if tail metrics improve without unacceptable completion degradation.

A useful canary question:

When the order exceeded visible depth, did the new model know that before the order was sent?


Observability checklist


Common mistakes

  1. Treating cumulative visible depth as executable depth.
    It is only executable depth if you never need the next unseen level.

  2. Optimizing mean cost only.
    Overflow damage is a tail phenomenon.

  3. Ignoring information-level train/serve mismatch.
    L10 training and L5 live inference is silent model corruption.

  4. Assuming deeper-book ignorance is random noise.
    It is state-dependent and often worst exactly when urgency is highest.

  5. Using larger slices because average slippage looked stable.
    Mean stability can coexist with catastrophic overflow tails.


Minimal implementation checklist


Practical takeaway

If your execution model sees only the first few levels, it should not pretend to forecast full sweep cost directly. First ask whether you are about to run out of visible book; then price the hidden remainder explicitly.


References