Order-Book Depth Truncation & Hidden-Gap Slippage Playbook

Date: 2026-04-11
Category: research
Scope: How L1/L5/L10-style partial-depth feeds understate aggressive cost and distort slippage control

Why this matters

A lot of production execution stacks do not have the full book available at decision time.

Common realities:

vendor only delivers top N price levels,
storage budget keeps only truncated depth in historical replay,
model features are built from L1/L5/L10 snapshots even when deeper book exists,
routing logic prices child slices off visible cumulative depth only.

That creates a specific failure mode:

the model thinks it knows the local supply curve, but it only knows the front porch.

When urgency rises or displayed depth thins, the order sweeps beyond the observed ladder, hits hidden gaps, and realized cost jumps several ticks beyond the model forecast. The result is not just noisier execution — it is a systematic tail-underestimation problem.

One-line intuition

Partial depth is fine until your order needs the first level you cannot see. After that, slippage becomes a hidden-gap problem, not a spread problem.

Failure mechanism (operator timeline)

Training data stores only top 5 or top 10 levels.
Cost model learns impact from visible cumulative depth and near-touch imbalance.
Router sizes marketable or urgency-escalated child orders using that truncated view.
In calm states, execution stays inside observed depth often enough that the model looks good on average.
In thin or stressed states, the order consumes past the last visible level.
True deeper-book spacing is wider than implied by the truncated ladder.
Realized implementation shortfall jumps, especially in p95/p99 tails.
Postmortem says “regime shift” or “sudden liquidity shock,” but part of the damage was simply book-information insufficiency.

Extend slippage decomposition with an information-level term

Let:

(C_{obs,N}(q)): estimated sweep cost for size (q) using only top (N) levels,
(C_{true}(q)): realized or full-book cost,
(IS): implementation shortfall.

Then write:

[ IS = IS_{market} + IS_{impact} + IS_{timing} + IS_{fees} + \underbrace{IS_{trunc}}_{\text{depth truncation / hidden-gap tax}} ]

with

[ IS_{trunc}(q, N) \approx C_{true}(q) - C_{obs,N}(q). ]

This term is near zero when the order stays inside observed depth, but becomes convex once sweep size crosses the truncation boundary.

Core production metrics

1) Overflow Probability (OP)

Probability that decision size (q) requires liquidity beyond the last visible level (N):

[ OP_N(q) = P\big(L^*(q) > N \mid x_t\big) ]

where (L^*(q)) is the deepest level actually needed to fill size (q), and (x_t) is state (spread, depth, volatility, imbalance, event intensity, venue, time-of-day).

This is the first metric to operationalize. If you cannot estimate overflow risk, you cannot trust the truncated-book cost estimate.

2) Conditional Hidden-Gap Burden (HGB)

Extra cost once overflow happens:

[ HGB_N(q) = E\big[C_{true}(q) - C_{obs,N}(q) \mid L^*(q) > N, x_t\big]. ]

Think of this as “how bad it gets when the visible book runs out.”

3) Truncation Coverage Ratio (TCR)

Observed fraction of required executable depth:

[ TCR_N(q) = \frac{D_{obs,N}(q)}{D_{req}(q) + \epsilon} ]

where:

(D_{obs,N}(q)): visible cumulative depth up to level (N),
(D_{req}(q)): depth actually required to complete the child order.

Low TCR means the model is making decisions with incomplete local supply information.

4) Information Sufficiency Curve (ISC)

Measure forecast quality as a function of depth level count:

[ ISC(N) = 1 - \frac{\mathrm{Loss}(N)}{\mathrm{Loss}(N_{full})} ]

for a cost or markout prediction loss of your choice.

This tells you whether L5 is “almost all the signal” or whether the step from L10 to L20 materially improves tail cost prediction.

5) Tail Overflow Loss Share (TOLS)

Fraction of p95/p99 cost attributable to truncation events:

[ TOLS = \frac{\sum IS_{trunc,i} \cdot 1{IS_i > q_{0.95}}}{\sum IS_i \cdot 1{IS_i > q_{0.95}}}. ]

If TOLS is large, your tail problem is partly a data-contract problem, not only a policy problem.

The key modeling split: two-stage truncation-aware cost model

A practical production model should separate:

Will we overflow the visible book?
If yes, how expensive is the hidden remainder?

Stage 1 — overflow classifier

Predict:

[ \hat{p}_{over} = P(L^*(q)>N \mid x_t). ]

Useful features:

spread and spread regime,
cumulative visible depth at L1/L3/L5/L10,
imbalance and microprice skew,
recent depletion/refill velocity,
cancel intensity by side,
trade intensity and sweep frequency,
time-of-day / auction proximity / halt transition state,
venue and symbol liquidity bucket.

Stage 2 — conditional overflow severity model

Given overflow, predict extra ticks/bps:

[ \widehat{HGB}N(q) = E[C{true}(q)-C_{obs,N}(q) \mid overflow, x_t]. ]

Good targets:

extra ticks beyond truncated estimate,
extra bps vs baseline cost curve,
worst-level reached,
post-trade short-horizon markout when overflow forced urgency escalation.

Combined estimator

[ E[C_{true}(q)\mid x_t] \approx C_{obs,N}(q) + \hat{p}_{over}(x_t)\cdot \widehat{HGB}_N(q, x_t). ]

This is much more robust than pretending truncated depth is the whole book.

Why average performance lies

A depth-truncated model can look good in backtests for three reasons:

most child orders are small and stay inside visible depth,
calm periods dominate sample count,
mean loss hides overflow tails.

So a model may “win” on average while still failing exactly when:

parent schedule is behind,
spreads widen,
queues evaporate,
one more tick matters.

This is why promotion gates should be tail-first, not mean-first.

Control states for a live router

GREEN — VISIBLE_BOOK_SUFFICIENT

low (OP_N(q))
low hidden-gap burden
small slices can trust visible depth

Actions:

normal child sizing,
standard passive/aggressive mix,
ordinary confidence weight on visible supply curve.

YELLOW — OVERFLOW_RISK_RISING

overflow probability climbing,
touch depth thinning,
refill slower than usual

Actions:

reduce child clip size,
shorten confidence horizon on truncated-book forecasts,
require more evidence before sweeping.

ORANGE — HIDDEN_GAP_EXPOSED

high overflow probability,
conditional burden materially positive,
p95 uplift concentrated in overflow states

Actions:

cap marketable size per wave,
re-route toward deeper venues,
widen uncertainty bands in implementation shortfall forecast,
use more conservative schedule catch-up.

RED — INFORMATION-INSUFFICIENT EXECUTION

visible book clearly not decision-grade for current size/state,
overflow burden unstable or exploding,
live model confidence collapsing

Actions:

switch to bounded deterministic sizing,
tighten notional caps,
prefer time-spreading over large sweeps,
escalate operator visibility for large residuals.

Use hysteresis and minimum dwell time; otherwise the controller will flap around thin-book transitions.

Engineering patterns that help in the real world

1) Keep a shadow full-depth sample even if live decisions use truncated depth

You do not need full depth for every symbol at every microsecond to estimate truncation damage.

Even one of these helps:

sampled full-depth capture,
periodic full-book snapshots,
venue-specific sweep reconstruction from historical updates,
post-trade deepest-level-reached logs.

Without any shadow truth, truncation tax becomes invisible.

2) Version the information level in your feature store

depth_levels=1, 5, 10, full should be explicit metadata.

Otherwise you silently mix:

models trained on L10,
replays reconstructed from L50,
live decisions running on L5.

That is a hidden train/serve mismatch.

3) Make child sizing conditional on information sufficiency

Do not use one global “safe marketable clip” if the visible depth level count changes by venue or symbol.

Safer rule:

[ q_{max}^{safe}(x_t) = \arg\max_q ; { OP_N(q\mid x_t) \le \alpha } ]

for an overflow-risk budget (\alpha).

4) Track deepest-level-reached as a first-class execution label

If your logs only keep fill price and notional, you miss the easiest proxy for truncation damage.

Track at least:

deepest visible level consumed,
whether deeper-than-visible execution was needed,
slippage delta vs truncated estimate.

5) Separate “book is thin” from “book is unknown” in controls

Those are not the same problem.

Thin but known book -> aggressive execution may still be optimal.
Partially unknown book -> uncertainty penalty should rise even before realized impact does.

Validation protocol

Build paired datasets using:
- truncated-depth features at level (N), and
- deeper-book or realized sweep truth.
Compare baseline cost model vs truncation-aware two-stage model.
Segment by:
- symbol liquidity tier,
- venue,
- urgency bucket,
- spread regime,
- time-of-day.
Report:
- mean IS,
- p95/p99 IS,
- overflow calibration error,
- completion reliability,
- false-safe rate (predicted safe but overflowed badly).
Promote only if tail metrics improve without unacceptable completion degradation.

A useful canary question:

When the order exceeded visible depth, did the new model know that before the order was sent?

Observability checklist

overflow probability by symbol/venue/time bucket
conditional hidden-gap burden histogram
deepest-level-reached distribution
truncated vs realized sweep-cost delta
information sufficiency curve by level count
p95/p99 IS split into overflow vs non-overflow events
live share of notional sent while OP_N(q) exceeds guardrail

Common mistakes

Treating cumulative visible depth as executable depth.
It is only executable depth if you never need the next unseen level.
Optimizing mean cost only.
Overflow damage is a tail phenomenon.
Ignoring information-level train/serve mismatch.
L10 training and L5 live inference is silent model corruption.
Assuming deeper-book ignorance is random noise.
It is state-dependent and often worst exactly when urgency is highest.
Using larger slices because average slippage looked stable.
Mean stability can coexist with catastrophic overflow tails.

Minimal implementation checklist

Log information level (L1/L5/L10/full) with every feature snapshot and replay dataset
Label overflow events: whether execution required liquidity beyond visible level (N)
Train a two-stage model: overflow probability + conditional hidden-gap burden
Put overflow-aware caps on marketable child sizing
Add tail dashboards split by overflow vs non-overflow events
Shadow-sample deeper book data for ongoing calibration

Practical takeaway

If your execution model sees only the first few levels, it should not pretend to forecast full sweep cost directly. First ask whether you are about to run out of visible book; then price the hidden remainder explicitly.

References

Paddrik, M., Hayes, R., Scherer, W., Beling, P. (2014). Effects of Limit Order Book Information Level on Market Stability Metrics. Office of Financial Research Working Paper. https://www.financialresearch.gov/working-papers/files/OFRwp2014-09_PaddrikHayesSchererBeling_EffectsLimitOrderBookInformationLevelMarketStabilityMetrics.pdf
Cont, R., Kukanov, A., Stoikov, S. (2014). The Price Impact of Order Book Events. Journal of Financial Econometrics. arXiv:1011.6402 — https://arxiv.org/abs/1011.6402
Pham, T., Anderson, H., Lajbcygier, P., Cui, Y. (2020). The effects of trade size and market depth on immediate price impact in a limit order book market. Journal of Economic Dynamics and Control. https://ideas.repec.org/a/eee/dyncon/v120y2020ics0165188920301603.html
Gould, M. D., Porter, M. A., Williams, S., McDonald, M., Fenn, D. J., Howison, S. D. (2013). Limit Order Books. Quantitative Finance, survey version on arXiv. https://arxiv.org/abs/1012.0349