Futures Implied-Liquidity Fragility & Outright-Depth Overstatement Slippage Playbook

Date: 2026-04-12
Category: research (execution / slippage modeling)

Why this playbook exists

A lot of futures execution models quietly treat displayed order-book depth as if it were one thing.

It is not.

In implied-enabled futures markets, visible liquidity can come from at least two very different sources:

direct / outright resting interest actually posted in the instrument you are trading,
implied interest synthesized by the matching engine from orders resting in related outrights or spreads.

Those two kinds of depth can look similar on-screen and in normalized market-data snapshots.

But they do not behave the same way under stress.

Implied depth is conditional on the continued existence, compatibility, and priority of source orders in related books. It can vanish when:

a leg market moves,
a related spread gets hit,
rounding or ratio constraints change the implied price,
the market leaves a matching state,
the exchange suspends implied calculation,
a better direct source outranks it,
another implied source wins the priority tie-break.

That creates a specific family of slippage bugs:

passive fill probability is overstated because the displayed queue is less durable than it looks,
sweep-cost models understate impact because some top-level size is conditionally synthetic,
markout labels blame or praise the strategy for moves caused by cross-book source collapse rather than local instrument pressure,
backtests learn from static displayed depth while production trades against graph-dependent liquidity.

This note turns that mismatch into a practical modeling framework.

Public market-structure facts that make this real

This is not just a philosophical distinction between “real” and “synthetic” liquidity. Public exchange documentation spells out that implied liquidity has different mechanics from outright liquidity.

A few examples from CME Globex public materials:

CME states that an implied order is identified in a spread market from outright orders (implied in) or in an outright market from a spread plus another outright (implied out).
CME states implication requires at least two orders in related markets in the proper combination.
CME states implied bids do not trade against implied offers.
CME states some implied quantity is calculated and tradable but not always disseminated. In particular, some implied-out quantity is not disseminated when spread-leg ratios are greater than one.
CME states implied quantity in futures markets does not have time priority.
CME’s matching-priority documentation states outrights (generation 0) trade first, followed by spreads and then other implied-source tie-break rules.
CME states implied quantity may be unavailable when the market is not in a matching state or when implied calculation is suspended.
CME also notes that stop-limit processing differs for implied-eligible instruments once triggered, which is another hint that implied-enabled markets are not just ordinary books with extra size.

The modeling implication is straightforward:

displayed size, direct queue position, and resilient execution opportunity are separate state variables.

If your slippage model collapses them into one number called “depth,” it is already lying.

The core failure mode

Suppose you are trading an outright futures contract.

At the best ask you observe 120 lots. Your model treats that as 120 lots of local supply.

But the actual composition may be something like:

25 lots of direct resting outright offers,
70 lots implied from one calendar spread plus a related leg,
25 lots implied from a different spread family.

Now imagine one related spread is lifted and the nearby leg reprices one tick.

The instrument you are trading may not have printed yet. But 95 lots of the displayed offer can disappear instantly because the source graph changed.

If the strategy was:

leaning passively because the offer looked deep,
sizing an IOC slice off displayed depth,
calibrating impact off top-level size,
measuring adverse selection against naive book imbalance,

then the model was trading against a fictional local book.

That is the bug.

I call it outright-depth overstatement:

treating cross-book conditional liquidity as if it were fully local, fully durable, and queue-equivalent to direct resting size.

Mechanism map

1. Passive fill overestimation

When you join a displayed queue in an implied-enabled outright, the queue ahead of you may contain a large implied component.

That sounds good at first.

But if that implied size is fragile, two bad things happen:

the apparent contra depth disappears before interacting,
your own resting order inherits less real trading opportunity than the book snapshot suggested.

Naive passive-fill models see “thick touch, low urgency.” Production reality is often “thin direct book wearing an implied costume.”

2. Sweep-cost understatement

Aggressive models often estimate immediate cost from displayed size at top levels.

If half of the top-of-book is implied and sourced from volatile related markets, the book can gap faster than a same-sized direct queue.

So the expected shortfall of a sweep is not just a function of visible size. It is a function of visible size weighted by source durability.

3. Queue-priority mirage

CME explicitly states that implied quantity in futures markets does not have time priority, and direct outrights are prioritized ahead of implied sources in relevant matching rules.

That means two equal-looking price levels can have very different execution meaning:

direct resting outright size may have stable queue semantics,
implied size may lose out in source competition even at the same displayed price.

A model that assumes all displayed lots at a price level are queue-equivalent will overestimate both passive opportunity and sweep resistance.

4. Hidden-implied paradox

CME also states that some implied quantity is calculated and tradable even when it is not disseminated.

So the market can be wrong in both directions:

displayed depth can overstate durable local depth,
displayed depth can understate actually reachable contingent liquidity.

This sounds contradictory, but it is the correct operational picture.

A book viewer sees one number. The matching engine sees a state-dependent liquidity graph.

5. Cross-book regime shock

Implied liquidity depends on related books, ratios, and eligible prices. So leg volatility, spread activity, or rounding changes in another market can abruptly reprice or erase liquidity in the instrument you are trading.

Then a strategy experiences slippage that looks local in the child-order log but was actually caused by remote source instability.

6. Matching-state cliffs

CME states implied quantity is unavailable in some non-matching states, such as pre-open, and may be suspended when implication would produce out-of-limit trades.

That means an instrument can move between:

direct + implied liquidity,
direct-only liquidity,
partially hidden implied liquidity,
temporarily implication-suspended states.

A backtest that uses only normalized depth snapshots and ignores these state transitions will badly mis-estimate opening, reopening, and stressed-period slippage.

A better abstraction: displayed depth vs resilience-adjusted depth

For an instrument (x) at price level (p) and time (t), define:

(D^{out}_{x,p}(t)): direct outright displayed depth,
(D^{imp,disp}_{x,p}(t)): disseminated implied displayed depth,
(D^{imp,hid}_{x,p}(t)): implied depth that is calculable / tradable but not disseminated,
(\mathcal{S}_{x,p}(t)): set of implied source combinations able to create liquidity at (p),
(Q_s(t)): quantity contributed by source (s \in \mathcal{S}),
(w_s(\Delta)\in[0,1]): survival probability that source (s) remains executable over horizon (\Delta).

Then naive observed displayed depth is:

[ D^{obs}{x,p}(t) = D^{out}{x,p}(t) + D^{imp,disp}_{x,p}(t) ]

But a more honest near-horizon execution quantity is:

[ D^{res}{x,p}(t;\Delta) = D^{out}{x,p}(t) + \sum_{s \in \mathcal{S}_{x,p}(t)} w_s(\Delta),Q_s(t) ]

Interpretation:

direct depth gets weight near 1 unless local cancellation hazard is high,
implied depth gets discounted by its source-survival probability,
the discount depends on leg churn, spread activity, priority, and matching state.

The important quantity is the resilience gap:

[ G_{x,p}(t;\Delta) = D^{obs}{x,p}(t) - D^{res}{x,p}(t;\Delta) ]

If (G) is large, the book is visually deep but execution-fragile.

You can extend this to multiple price levels and define resilience-adjusted sweep cost using (D^{res}) instead of raw displayed size.

The hidden state is graph-based, not iid

Implied-liquidity behavior is not well described by an iid cancellation process.

An implied quote in one market may depend on:

one spread and one outright,
two outrights creating an implied spread,
multiple alternative source paths with different generations,
ratio and rounding rules,
exchange matching-state flags,
priority tie-breaks among multiple implied sources.

So a useful hidden state is not just “implied or not.” It is something like:

DIRECT_ONLY
DIRECT_PLUS_STABLE_IMPLIED
DIRECT_PLUS_FRAGILE_IMPLIED
IMPLIED_DOMINANT
IMPLIED_HIDDEN_NONDISSEMINATED_PRESENT
IMPLICATION_SUSPENDED
NON_MATCHING_STATE

And the state transition depends on a source graph, not merely on local order aging.

That matters because the same displayed 100 lots can mean very different things:

100 direct lots in the outright,
20 direct + 80 implied from a single unstable source,
20 direct + 80 implied from ten diversified sources,
20 direct + 80 implied where direct outrights or lower-generation sources will always outrank your expected interaction path.

A model that sees only “100 lots at best ask” throws away the economically important part.

Mechanically, where slippage enters

1. Opportunity-cost slippage

The strategy waits because displayed contra depth looks abundant. By the time it crosses, the implied portion is gone. Now the trade pays more than if it had recognized fragility earlier.

2. Completion-risk slippage

A passive schedule relies on displayed depth for expected fill rates. The direct book is actually thin, so completion falls behind schedule and later urgency rises.

3. Benchmark contamination

TCA compares fills against displayed top-of-book depth and imbalance snapshots. But the “book state” used as benchmark was dominated by implied size that had low survival probability. Measured slippage is then partly benchmark error.

4. Label drift in research data

Historical depth snapshots often preserve displayed quantities but not a faithful replay of source-path durability. If training labels assume displayed size was uniformly durable, the model learns an unrealistically forgiving market.

5. False liquidity regime inference

A symbol can look deep during calm periods because related books are stable, then suddenly behave thin during cross-book turbulence. A model without implied-source features mislabels this as unexplained regime drift.

Metrics worth instrumenting

1. ISAT — Implied Share at Touch

[ ISAT(t) = \frac{D^{imp,disp}{best}(t)}{D^{obs}{best}(t)} ]

Track separately for bid and ask. If ISAT is high, touch depth is more conditional than it appears.

2. DDC — Direct Depth Coverage

[ DDC(t) = \frac{D^{out}{best}(t)}{D^{obs}{best}(t)} ]

This is the flip side of ISAT. High DDC means the touch is mostly local and durable. Low DDC means the touch is more cross-book dependent.

3. RAG — Resilience-Adjusted Gap

[ RAG(t;\Delta) = D^{obs}{best}(t) - D^{res}{best}(t;\Delta) ]

This is the main quantity for slippage modeling. It measures how much the book overstates near-horizon executable resilience.

4. SOC — Source Overlap Concentration

How concentrated implied size is across unique source paths.

If 80 implied lots come from one source path, fragility is much higher than if 80 lots come from twenty loosely related paths.

5. ICHR — Implied Cliff Hazard Rate

Probability that best-level displayed depth loses more than (k)% within horizon (\Delta) without a local same-instrument trade printing first.

This helps detect remote-source collapse.

6. QPP — Queue Priority Penalty

Expected execution handicap from interacting with implied-dominated price levels where direct outrights or lower-generation sources receive priority.

7. HIG — Hidden Implied Gap

Estimate of tradable but non-disseminated implied liquidity relative to disseminated implied liquidity.

This is hard to measure perfectly, but even a proxy matters because it tells you when displayed depth is understating contingency.

8. MSR — Matching-State Risk

Share of time the product spends in states where implication is unavailable, suspended, or degraded.

9. LCV — Leg-Churn Volatility

A short-horizon feature measuring how violently source legs and related spreads are repricing. High LCV usually reduces implied-source survival.

10. ODO — Outright-Depth Overstatement

For a child order benchmarked at decision time:

[ ODO = \text{naive displayed executable size} - \text{resilience-adjusted executable size} ]

This is the quantity you want to attribute before calling the outcome “impact.”

Feature set for slippage models

A. Book composition features

best_bid_direct_qty
best_ask_direct_qty
best_bid_implied_disp_qty
best_ask_implied_disp_qty
level2_to_level5_direct_qty
level2_to_level5_implied_qty
direct_depth_share_top_n
implied_depth_share_top_n

B. Source-graph features

num_unique_implied_sources_best
source_overlap_concentration_best
source_entropy_best
source_generation_mix
num_alt_source_paths
spread_family_count
ratio_gt_one_source_flag
non_disseminated_implied_possible_flag

C. Priority / queue features

direct_ahead_qty
implied_ahead_qty
outright_priority_flag
implied_priority_rank
expected_queue_priority_penalty
passive_fill_from_direct_only_estimate

D. Cross-book dynamics

leg_bid_ask_churn_100ms
spread_bid_ask_churn_100ms
leg_trade_intensity_1s
spread_trade_intensity_1s
cross_book_cancel_rate
implied_recompute_rate
best_level_disappear_without_local_trade_rate

E. Matching-state / rules features

matching_state
implication_suspended_flag
preopen_or_nonmatching_flag
mpi_rounding_risk_flag
product_implied_eligible_flag

F. Label-integrity features

decision_time_displayed_depth
decision_time_direct_depth
decision_time_resilience_adjusted_depth
book_state_confidence_score
hidden_implied_proxy

Important rule:

implied depth should not enter the model only as extra size. It should enter as extra size plus extra fragility, extra dependency, and different queue semantics.

Labeling blueprint

For every child-order decision or fill, capture at least four book views.

View 1 — displayed book

What the normalized market-data view showed:

total displayed depth by level,
displayed imbalance,
visible sweep cost.

View 2 — direct-outright book

What depth remained if you strip out disseminated implied size.

View 3 — resilience-adjusted book

What depth remained after discounting implied size by estimated source survival.

View 4 — hidden-contingency proxy

A best-effort estimate of non-disseminated but tradable implied depth.

Then define multiple labels.

Label 1 — naive displayed-depth slippage

Measured relative to raw displayed depth. Useful mostly as a baseline diagnostic.

Label 2 — direct-book slippage

Measured against the outright-only book. Useful for understanding local liquidity conditions.

Label 3 — resilience-adjusted slippage

Measured against the book after implied durability discounting. This is usually the economically honest label for execution control.

Label 4 — overstatement gap

Difference between naive displayed-depth expectations and resilience-adjusted expectations.

This isolates how much forecast error came from treating implied depth as too real.

Label 5 — remote-source shock tag

Binary or graded label indicating that the pre-trade book changed primarily due to related-leg or spread activity rather than same-instrument order flow.

Without this tag, models will attribute the wrong cause to a lot of “unexplained” misses.

Policy rules for execution stacks

Rule 1: maintain separate direct and implied book states

Do not store only bestBidQty / bestAskQty if the venue gives composition or if you can reconstruct it. You need at least:

direct depth,
disseminated implied depth,
source-count / source-concentration proxies.

Rule 2: passive scheduling should key off direct depth first

If passive-fill urgency is computed from full displayed depth, it will often wait too long. A more robust rule is:

base passive patience on direct depth,
use implied depth as a conditional bonus with haircut.

Rule 3: sweep models should price fragility, not just size

Replace raw depth with resilience-adjusted depth in short-horizon impact models. The same visible 100 lots should cost more when 80 of them are fragile implied size.

Rule 4: detect remote-source collapse explicitly

If a best level disappears without a local trade and coincides with spread or leg churn, classify it as source-graph collapse rather than ordinary cancellation noise.

Rule 5: benchmark queue position against direct priority

When exchanges document outright or lower-generation priority, passive-fill models must reflect that queue semantics are asymmetric. Equal displayed size does not mean equal queue rank.

Rule 6: backtests must include implication-state transitions

Pre-open, reopen, implication suspension, and source unavailability should be replayed as liquidity-state changes, not treated as stationary book evolution.

Rule 7: separate visible scarcity from actionable scarcity

A market can look thin but still have hidden contingent implied liquidity. A market can look thick but have low resilience. You need both dimensions.

Rule 8: do not blame all misses on impact

Before declaring “impact rose,” ask:

did direct depth shrink,
did implied share rise,
did source concentration spike,
did resilience-adjusted depth diverge from displayed depth?

A lot of supposed impact drift is really a composition drift problem.

Common anti-patterns

Treating all displayed lots as queue-equivalent.
Using top-of-book size without direct/implied decomposition.
Modeling passive fill only from displayed queue length.
Using the same depth features in direct-only and implied-enabled products.
Ignoring exchange-documented priority rules for implied sources.
Assuming disseminated implied quantity is the full story.
Assuming visible implied quantity is always durable local liquidity.
Training on static book snapshots without source-path survival features.
Explaining remote-source collapse as generic noise.
Calling the resulting forecast error “market impact” without decomposition.

30-day rollout plan

Week 1 — make book composition observable

Store direct vs implied displayed quantity by level where available.
Preserve product flags for implied eligibility and matching state.
Add dashboards for ISAT, DDC, and simple ODO.
Tag events where best-level depth disappears without a same-instrument trade.

Week 2 — build source-fragility features

Create leg and spread churn features.
Estimate source overlap concentration.
Fit first survival models for implied depth over 50ms / 100ms / 500ms / 1s horizons.
Produce first resilience-adjusted depth series.

Week 3 — retrain slippage and fill models

Compare naive displayed-depth models with direct-only and resilience-adjusted versions.
Rebuild passive-fill probability models using direct priority semantics.
Add remote-source shock tags to training data.
Measure forecast improvement by product and time-of-day bucket.

Week 4 — harden production controls

Slow passive posting when ISAT is high and DDC is low.
Increase aggressiveness when displayed depth is large but resilience-adjusted depth is thin.
Alert on implication-suspension / matching-state cliffs.
Publish symbol/product buckets where outright-depth overstatement is structurally high.

What good looks like

A production-grade futures slippage stack should be able to answer:

How much of the displayed depth was direct outright size?
How much was disseminated implied size?
How concentrated were the implied source paths?
Which exchange priority rules disadvantaged that implied liquidity?
What portion of the displayed book was likely to survive the decision horizon?
Did the book change because of local flow or remote source collapse?
Would this child have been sized differently if the model had used resilience-adjusted depth?
How much of measured slippage came from outright-depth overstatement rather than actual impact?

If you cannot answer those questions, your book model is probably too visual and not nearly structural enough.

Selected public references

CME Group Client Systems Wiki, Implied Orders — public overview of implied in / implied out, minimum source requirements, non-disseminated yet tradable implied quantity, lack of time priority, and state conditions where implication is unavailable:
- https://www.cmegroup.com/confluence/display/EPICSANDBOX/Implied+Orders
CME Group, Liquidity in implied inter-commodity spread markets — practical examples of how implied in / implied out quantities are formed, how top-of-book implied quantity is computed, and the fact that not all calculated implied orders are disseminated:
- https://www.cmegroup.com/articles/2023/liquidity-in-implied-inter-commodity-spread-markets.html
CME Group Client Systems Wiki, Futures Implied Order Matching Priority — public documentation that outright generation 0 trades first and that implied sources are ordered with explicit tie-break logic:
- https://cmegroupclientsite.atlassian.net/wiki/spaces/EPICSANDBOX/pages/457096650/Futures+Implied+Order+Matching+Priority
CME Group, Understanding Implied Liquidity in Equity Sector futures — useful reminder that visible CLOB liquidity is not the full liquidity profile available to futures traders:
- https://www.cmegroup.com/education/courses/introduction-to-equity-sector-futures/understanding-implied-liquidity-in-select-sector-futures.html
Zotikov, CME Iceberg Order Detection and Prediction (arXiv) — not a paper about implied orders specifically, but a useful public reminder that hidden and synthetic liquidity effects in CME markets are empirically measurable and important for execution modeling:
- https://arxiv.org/abs/1909.09495

Bottom line

Displayed futures depth is not a single substance.

Some of it is direct, local, and queue-stable. Some of it is implied, conditional, and graph-dependent. Some of it is hidden from dissemination but still matters to execution.

The expensive mistake is to turn all of that into one scalar called “size at the touch.”

Execution models get much better when they stop asking only:

how much size was visible?

and start asking:

how much of that size was direct, how much was implied, and how much was likely to still exist when I actually needed it?

That sounds like microstructure pedantry.

It is really slippage control.