Futures Implied-Liquidity Fragility & Outright-Depth Overstatement Slippage Playbook

2026-04-12 · finance

Futures Implied-Liquidity Fragility & Outright-Depth Overstatement Slippage Playbook

Date: 2026-04-12
Category: research (execution / slippage modeling)

Why this playbook exists

A lot of futures execution models quietly treat displayed order-book depth as if it were one thing.

It is not.

In implied-enabled futures markets, visible liquidity can come from at least two very different sources:

Those two kinds of depth can look similar on-screen and in normalized market-data snapshots.

But they do not behave the same way under stress.

Implied depth is conditional on the continued existence, compatibility, and priority of source orders in related books. It can vanish when:

That creates a specific family of slippage bugs:

  1. passive fill probability is overstated because the displayed queue is less durable than it looks,
  2. sweep-cost models understate impact because some top-level size is conditionally synthetic,
  3. markout labels blame or praise the strategy for moves caused by cross-book source collapse rather than local instrument pressure,
  4. backtests learn from static displayed depth while production trades against graph-dependent liquidity.

This note turns that mismatch into a practical modeling framework.


Public market-structure facts that make this real

This is not just a philosophical distinction between “real” and “synthetic” liquidity. Public exchange documentation spells out that implied liquidity has different mechanics from outright liquidity.

A few examples from CME Globex public materials:

The modeling implication is straightforward:

displayed size, direct queue position, and resilient execution opportunity are separate state variables.

If your slippage model collapses them into one number called “depth,” it is already lying.


The core failure mode

Suppose you are trading an outright futures contract.

At the best ask you observe 120 lots. Your model treats that as 120 lots of local supply.

But the actual composition may be something like:

Now imagine one related spread is lifted and the nearby leg reprices one tick.

The instrument you are trading may not have printed yet. But 95 lots of the displayed offer can disappear instantly because the source graph changed.

If the strategy was:

then the model was trading against a fictional local book.

That is the bug.

I call it outright-depth overstatement:

treating cross-book conditional liquidity as if it were fully local, fully durable, and queue-equivalent to direct resting size.


Mechanism map

1. Passive fill overestimation

When you join a displayed queue in an implied-enabled outright, the queue ahead of you may contain a large implied component.

That sounds good at first.

But if that implied size is fragile, two bad things happen:

Naive passive-fill models see “thick touch, low urgency.” Production reality is often “thin direct book wearing an implied costume.”

2. Sweep-cost understatement

Aggressive models often estimate immediate cost from displayed size at top levels.

If half of the top-of-book is implied and sourced from volatile related markets, the book can gap faster than a same-sized direct queue.

So the expected shortfall of a sweep is not just a function of visible size. It is a function of visible size weighted by source durability.

3. Queue-priority mirage

CME explicitly states that implied quantity in futures markets does not have time priority, and direct outrights are prioritized ahead of implied sources in relevant matching rules.

That means two equal-looking price levels can have very different execution meaning:

A model that assumes all displayed lots at a price level are queue-equivalent will overestimate both passive opportunity and sweep resistance.

4. Hidden-implied paradox

CME also states that some implied quantity is calculated and tradable even when it is not disseminated.

So the market can be wrong in both directions:

This sounds contradictory, but it is the correct operational picture.

A book viewer sees one number. The matching engine sees a state-dependent liquidity graph.

5. Cross-book regime shock

Implied liquidity depends on related books, ratios, and eligible prices. So leg volatility, spread activity, or rounding changes in another market can abruptly reprice or erase liquidity in the instrument you are trading.

Then a strategy experiences slippage that looks local in the child-order log but was actually caused by remote source instability.

6. Matching-state cliffs

CME states implied quantity is unavailable in some non-matching states, such as pre-open, and may be suspended when implication would produce out-of-limit trades.

That means an instrument can move between:

A backtest that uses only normalized depth snapshots and ignores these state transitions will badly mis-estimate opening, reopening, and stressed-period slippage.


A better abstraction: displayed depth vs resilience-adjusted depth

For an instrument (x) at price level (p) and time (t), define:

Then naive observed displayed depth is:

[ D^{obs}{x,p}(t) = D^{out}{x,p}(t) + D^{imp,disp}_{x,p}(t) ]

But a more honest near-horizon execution quantity is:

[ D^{res}{x,p}(t;\Delta) = D^{out}{x,p}(t) + \sum_{s \in \mathcal{S}_{x,p}(t)} w_s(\Delta),Q_s(t) ]

Interpretation:

The important quantity is the resilience gap:

[ G_{x,p}(t;\Delta) = D^{obs}{x,p}(t) - D^{res}{x,p}(t;\Delta) ]

If (G) is large, the book is visually deep but execution-fragile.

You can extend this to multiple price levels and define resilience-adjusted sweep cost using (D^{res}) instead of raw displayed size.


The hidden state is graph-based, not iid

Implied-liquidity behavior is not well described by an iid cancellation process.

An implied quote in one market may depend on:

So a useful hidden state is not just “implied or not.” It is something like:

And the state transition depends on a source graph, not merely on local order aging.

That matters because the same displayed 100 lots can mean very different things:

A model that sees only “100 lots at best ask” throws away the economically important part.


Mechanically, where slippage enters

1. Opportunity-cost slippage

The strategy waits because displayed contra depth looks abundant. By the time it crosses, the implied portion is gone. Now the trade pays more than if it had recognized fragility earlier.

2. Completion-risk slippage

A passive schedule relies on displayed depth for expected fill rates. The direct book is actually thin, so completion falls behind schedule and later urgency rises.

3. Benchmark contamination

TCA compares fills against displayed top-of-book depth and imbalance snapshots. But the “book state” used as benchmark was dominated by implied size that had low survival probability. Measured slippage is then partly benchmark error.

4. Label drift in research data

Historical depth snapshots often preserve displayed quantities but not a faithful replay of source-path durability. If training labels assume displayed size was uniformly durable, the model learns an unrealistically forgiving market.

5. False liquidity regime inference

A symbol can look deep during calm periods because related books are stable, then suddenly behave thin during cross-book turbulence. A model without implied-source features mislabels this as unexplained regime drift.


Metrics worth instrumenting

1. ISAT — Implied Share at Touch

[ ISAT(t) = \frac{D^{imp,disp}{best}(t)}{D^{obs}{best}(t)} ]

Track separately for bid and ask. If ISAT is high, touch depth is more conditional than it appears.

2. DDC — Direct Depth Coverage

[ DDC(t) = \frac{D^{out}{best}(t)}{D^{obs}{best}(t)} ]

This is the flip side of ISAT. High DDC means the touch is mostly local and durable. Low DDC means the touch is more cross-book dependent.

3. RAG — Resilience-Adjusted Gap

[ RAG(t;\Delta) = D^{obs}{best}(t) - D^{res}{best}(t;\Delta) ]

This is the main quantity for slippage modeling. It measures how much the book overstates near-horizon executable resilience.

4. SOC — Source Overlap Concentration

How concentrated implied size is across unique source paths.

If 80 implied lots come from one source path, fragility is much higher than if 80 lots come from twenty loosely related paths.

5. ICHR — Implied Cliff Hazard Rate

Probability that best-level displayed depth loses more than (k)% within horizon (\Delta) without a local same-instrument trade printing first.

This helps detect remote-source collapse.

6. QPP — Queue Priority Penalty

Expected execution handicap from interacting with implied-dominated price levels where direct outrights or lower-generation sources receive priority.

7. HIG — Hidden Implied Gap

Estimate of tradable but non-disseminated implied liquidity relative to disseminated implied liquidity.

This is hard to measure perfectly, but even a proxy matters because it tells you when displayed depth is understating contingency.

8. MSR — Matching-State Risk

Share of time the product spends in states where implication is unavailable, suspended, or degraded.

9. LCV — Leg-Churn Volatility

A short-horizon feature measuring how violently source legs and related spreads are repricing. High LCV usually reduces implied-source survival.

10. ODO — Outright-Depth Overstatement

For a child order benchmarked at decision time:

[ ODO = \text{naive displayed executable size} - \text{resilience-adjusted executable size} ]

This is the quantity you want to attribute before calling the outcome “impact.”


Feature set for slippage models

A. Book composition features

B. Source-graph features

C. Priority / queue features

D. Cross-book dynamics

E. Matching-state / rules features

F. Label-integrity features

Important rule:

implied depth should not enter the model only as extra size. It should enter as extra size plus extra fragility, extra dependency, and different queue semantics.


Labeling blueprint

For every child-order decision or fill, capture at least four book views.

View 1 — displayed book

What the normalized market-data view showed:

View 2 — direct-outright book

What depth remained if you strip out disseminated implied size.

View 3 — resilience-adjusted book

What depth remained after discounting implied size by estimated source survival.

View 4 — hidden-contingency proxy

A best-effort estimate of non-disseminated but tradable implied depth.

Then define multiple labels.

Label 1 — naive displayed-depth slippage

Measured relative to raw displayed depth. Useful mostly as a baseline diagnostic.

Label 2 — direct-book slippage

Measured against the outright-only book. Useful for understanding local liquidity conditions.

Label 3 — resilience-adjusted slippage

Measured against the book after implied durability discounting. This is usually the economically honest label for execution control.

Label 4 — overstatement gap

Difference between naive displayed-depth expectations and resilience-adjusted expectations.

This isolates how much forecast error came from treating implied depth as too real.

Label 5 — remote-source shock tag

Binary or graded label indicating that the pre-trade book changed primarily due to related-leg or spread activity rather than same-instrument order flow.

Without this tag, models will attribute the wrong cause to a lot of “unexplained” misses.


Policy rules for execution stacks

Rule 1: maintain separate direct and implied book states

Do not store only bestBidQty / bestAskQty if the venue gives composition or if you can reconstruct it. You need at least:

Rule 2: passive scheduling should key off direct depth first

If passive-fill urgency is computed from full displayed depth, it will often wait too long. A more robust rule is:

Rule 3: sweep models should price fragility, not just size

Replace raw depth with resilience-adjusted depth in short-horizon impact models. The same visible 100 lots should cost more when 80 of them are fragile implied size.

Rule 4: detect remote-source collapse explicitly

If a best level disappears without a local trade and coincides with spread or leg churn, classify it as source-graph collapse rather than ordinary cancellation noise.

Rule 5: benchmark queue position against direct priority

When exchanges document outright or lower-generation priority, passive-fill models must reflect that queue semantics are asymmetric. Equal displayed size does not mean equal queue rank.

Rule 6: backtests must include implication-state transitions

Pre-open, reopen, implication suspension, and source unavailability should be replayed as liquidity-state changes, not treated as stationary book evolution.

Rule 7: separate visible scarcity from actionable scarcity

A market can look thin but still have hidden contingent implied liquidity. A market can look thick but have low resilience. You need both dimensions.

Rule 8: do not blame all misses on impact

Before declaring “impact rose,” ask:

A lot of supposed impact drift is really a composition drift problem.


Common anti-patterns


30-day rollout plan

Week 1 — make book composition observable

Week 2 — build source-fragility features

Week 3 — retrain slippage and fill models

Week 4 — harden production controls


What good looks like

A production-grade futures slippage stack should be able to answer:

  1. How much of the displayed depth was direct outright size?
  2. How much was disseminated implied size?
  3. How concentrated were the implied source paths?
  4. Which exchange priority rules disadvantaged that implied liquidity?
  5. What portion of the displayed book was likely to survive the decision horizon?
  6. Did the book change because of local flow or remote source collapse?
  7. Would this child have been sized differently if the model had used resilience-adjusted depth?
  8. How much of measured slippage came from outright-depth overstatement rather than actual impact?

If you cannot answer those questions, your book model is probably too visual and not nearly structural enough.


Selected public references

Bottom line

Displayed futures depth is not a single substance.

Some of it is direct, local, and queue-stable. Some of it is implied, conditional, and graph-dependent. Some of it is hidden from dissemination but still matters to execution.

The expensive mistake is to turn all of that into one scalar called “size at the touch.”

Execution models get much better when they stop asking only:

how much size was visible?

and start asking:

how much of that size was direct, how much was implied, and how much was likely to still exist when I actually needed it?

That sounds like microstructure pedantry.

It is really slippage control.