Sale-Condition Filtering & Benchmark-Contamination Slippage Playbook

Date: 2026-04-12
Category: research (execution / slippage modeling)

Why this playbook exists

A lot of execution analytics quietly assume that every reported trade is equally useful as a market reference.

That is false.

In real market data, some prints are good anchors for intraday execution modeling, and some are not. A reported trade can be:

a normal continuous-book execution,
an auction print,
a derivatively priced print,
a qualified contingent trade,
a prior-reference-price or stopped-stock style report,
an out-of-sequence / late / as-of report,
a corrected or canceled trade,
a special-condition print that updates volume but should not drive the same benchmark logic as continuous trading.

If a slippage stack treats those prints as one homogeneous stream, three bad things happen fast:

benchmarks drift away from the actually tradeable market,
toxicity / markout labels become contaminated,
TCA starts praising or blaming the router for moves it never truly caused or could have reacted to.

This is one of those bugs that hides in plain sight because the tape still looks like trading happened.

But in execution modeling, “a trade happened” is not enough.

The real question is:

what kind of trade was it, and was it eligible to represent the current continuous market for the label I am building?

This note turns that question into a practical framework for feature engineering, benchmark construction, labeling, and controls.

Public market-structure facts that make this real

This is not just a data-cleaning preference. Public rulebooks and feed documentation explicitly distinguish among trade types.

A few examples:

Nasdaq’s 2015 UTP vendor alert on sale-condition changes states that Contingent Trade (V) and Qualified Contingent Trade (7) processing rules were modified so those trades update volume only and are not eligible to update consolidated or market-center high/low or last in the ordinary way.
The SEC’s QCT exemptive order explains why qualified contingent trades are special in the first place: the price is determined by the relationship among linked component orders rather than simply by the current standalone market price of the security.
FINRA trade-reporting modifier guidance explicitly distinguishes modifiers whose trades may update last sale, may update last sale only conditionally, or should be treated differently for transparency / reporting purposes.
CTA / UTP specifications and sale-condition announcements exist because SIPs do not process every reported trade identically for consolidated market statistics.

The modeling implication is simple:

the tape already contains a taxonomy of “how much this print should mean.”

If your model ignores that taxonomy, it is manufacturing a cleaner and more continuous market than the one you actually observed.

The core failure mode

Suppose you are grading a child order against:

arrival last,
short-horizon markout versus last sale,
trade-to-trade micro returns,
quote/trade toxicity labels,
participation pacing tied to reported trade flow.

Now imagine the most recent tape print before or after the child is one of these:

a large auction print,
a qualified contingent trade,
a derivatively priced trade,
a prior-reference-price trade,
a late report,
a corrected out-of-sequence report.

If the model uses that print as though it were an ordinary continuous execution, it can:

assign fake market movement to the live book,
misstate short-term impact and reversion,
sign trades against stale or economically irrelevant price moves,
inflate or crush realized volume denominators,
create spurious “alpha decay” or “routing regression.”

This is not a small hygiene issue.

For some labels, a single misclassified print can be more damaging than a few milliseconds of timestamp noise.

A useful abstraction: trade eligibility depends on the downstream use

Let each trade report at time (t) have:

price (p_t),
reported size (q_t),
sale condition / modifier vector (m_t),
execution time (\tau_t^{exec}),
publication time (\tau_t^{pub}),
correction state (c_t).

Define a task-specific eligibility function:

[ E^{(k)}_t = f_k(m_t, c_t, \tau_t^{exec}, \tau_t^{pub}) \in {0,1} ]

where (k) is the downstream use case.

Examples:

(k = \text{benchmark}): can this trade be used to anchor continuous-market execution benchmarking?
(k = \text{markout}): can this trade be used in short-horizon post-trade drift labels?
(k = \text{volume}): can this trade count toward a pacing denominator?
(k = \text{last}): should this trade be allowed to reset your notion of current market price?

The important point:

there is no single universal “good trade print” flag.

A trade can be valid for volume accounting but invalid for short-horizon continuous-price labeling.

That is exactly why condition codes exist.

Mechanism map

1. Continuous-book benchmark contamination

A child order arrives when the lit market is around $100.00 / $100.01.

Seconds later, a derivatively priced or contingent print appears at $99.94 for a large reported size.

If your TCA stack marks the child against “latest trade” without sale-condition filtering, the child suddenly looks like it overpaid by 7 cents, even though the displayed continuous market never really moved there.

That is not slippage.

That is benchmark contamination.

2. Markout distortion

Suppose a router buys at the offer, then the next tape print is a late report or auction-related special print far away from the touch.

A naive 1-second or next-print markout says the fill immediately went against you.

But the print was not a reliable proxy for the live marginal market.

Now the model learns that a healthy fill was toxic.

Repeat this enough times and the strategy starts avoiding good flow.

3. Trade-sign / toxicity label damage

Many toxicity pipelines infer aggressor side or price impact from sequences of trade-to-trade moves.

When non-standard prints enter the same return path as continuous prints, you get:

fake jumps,
fake reversals,
wrong signed-volume propagation,
unstable adverse-selection estimates.

In other words, sale-condition filtering is not just for TCA.

It is a label-integrity problem.

4. Reported-volume pacing illusion

Some special-condition trades still update volume even when they should not drive the same price benchmark logic.

That means a POV / participation engine can be simultaneously:

right to count the volume in one denominator,
wrong to use the associated price as the new continuous reference.

If the system cannot separate those two facts, it confuses reporting semantics with executable price discovery.

5. Late-report time-travel

A late or out-of-sequence report can enter the stream after the router decision but represent an earlier execution time.

If a replay joins labels to report arrival time instead of market-observable time and condition eligibility, the model can accidentally “discover” price moves that the live controller could not have known about.

That is classic point-in-time leakage wearing a tape-costume.

A better decomposition: observed tape vs continuous benchmark stream

Construct two trade streams instead of one:

Stream A — full reported trade stream

Contains all reported trades that matter for:

compliance / audit,
full tape reconstruction,
volume accounting,
post-trade reconciliation.

Stream B — continuous-benchmark eligible stream

Contains only trades eligible for the specific benchmark or label you are building.

For a benchmark use case, define:

[ P^{bench}(t) = g\big({p_i : i \le t, E^{(bench)}_i = 1}\big) ]

For volume use cases:

[ V^{pacing}(t) = \sum_{i \le t} q_i \cdot E^{(volume)}_i ]

The key operational rule is this:

the same trade may be included in (V^{pacing}) and excluded from (P^{bench}).

That is not inconsistency.

That is correct modeling.

Metrics worth instrumenting

1. BCG — Benchmark Contamination Gap

[ BCG(t) = P^{raw_last}(t) - P^{bench}(t) ]

How far your naive last-trade benchmark deviates from the condition-filtered continuous benchmark.

Track by symbol, venue mix, time of day, and sale-condition bucket.

2. NSPR — Non-Standard Print Ratio

Share of reported trades or volume with sale conditions outside your continuous-benchmark-eligible set.

Compute separately by:

trade count,
reported volume,
notional,
next-print usage in labels.

3. LRTL — Late-Report Time-Leak

Fraction of labels that would differ if you use report-publication time instead of strict as-of observable time.

4. MDI — Markout Distortion Index

Difference between markouts computed from:

raw next-print / raw last,
filtered continuous-benchmark stream,
quote-based reference.

When MDI spikes, your “toxicity” is often just tape taxonomy bleeding into the label.

5. VPI — Volume/Price Inconsistency

Fraction of intervals where a trade contributes to volume pacing but is excluded from price benchmarking.

This should not be forced to zero.

You want to measure it because it tells you how often your system is operating in a mixed reporting regime.

6. CCR — Correction Contamination Rate

How often corrected or canceled prints change historical labels relative to the live-as-of version the controller actually saw.

7. SCS — Sale-Condition Shock

Short-window burst intensity of benchmark-ineligible trade reports.

Useful during auctions, reopenings, or venue/reporting anomalies.

Feature set for slippage models

A. Trade-report semantics

sale_condition_primary
sale_condition_secondary
modifier_flags
trade_reporting_facility
qct_flag
derivatively_priced_flag
prior_reference_flag
stopped_stock_flag
auction_related_flag
out_of_sequence_flag
late_report_flag
correction_flag
cancel_flag

B. Time-quality features

exec_to_pub_latency_ms
pub_order_vs_exec_order_rank_gap
time_since_last_benchmark_eligible_trade_ms
time_since_last_any_report_ms
report_age_at_join_ms

C. Stream-divergence features

raw_last_vs_filtered_last_bps
raw_vwap_vs_filtered_vwap_bps
raw_trade_rate_vs_filtered_trade_rate
raw_notional_vs_filtered_notional
nonstandard_print_share_1s/10s/60s

D. Quote context

mid_at_exec
mid_at_pub
nbbo_spread_ticks
quote_staleness_ms
protected_best_bid
protected_best_offer
trade_vs_prevailing_mid_bps

E. Regime features

auction_window_flag
halt_reopen_window_flag
close_window_flag
symbol_special_print_baseline
venue_mix_fragmentation_score

Important rule:

sale conditions should enter the model as first-class features and filtering gates, not just as downstream BI metadata.

Labeling blueprint

For every child order or decision timestamp, store both raw and filtered references.

At minimum capture:

latest raw reported trade,
latest benchmark-eligible trade,
latest quote-based mid,
latest volume-eligible trade and cumulative volume,
modifier / sale-condition details for any intervening trades,
as-of observable timestamps,
correction status.

Then build separate labels.

Label 1 — raw-tape slippage

[ S_{raw} = p_{fill} - P^{raw_last}(t) ]

Useful mostly as a diagnostic, not as your truth.

Label 2 — filtered continuous-benchmark slippage

[ S_{flt} = p_{fill} - P^{bench}(t) ]

This is usually the honest execution benchmark.

Label 3 — quote-anchored slippage

[ S_{mid} = p_{fill} - mid(t) ]

Useful when trade prints are sparse or noisy.

Label 4 — contamination gap

[ C_{gap} = S_{raw} - S_{flt} ]

This is the cleanest way to quantify how much performance measurement was altered by trade-condition handling rather than routing quality.

Label 5 — live-vs-repaired gap

[ L_{gap} = S_{flt}^{live_asof} - S_{flt}^{hindsight_repaired} ]

This isolates how much hindsight correction or cancellation would rewrite the label.

That matters if you retrain models on cleaned history while live routing had to trade on dirtier truth.

Policy rules for execution stacks

Rule 1: maintain multiple benchmark streams on purpose

You probably need at least three:

raw tape stream,
filtered continuous benchmark stream,
quote-derived reference stream.

One stream cannot serve all jobs honestly.

Rule 2: filtering must be use-case specific

Do not build one global valid_trade_flag and call it done.

A print can be:

valid for volume,
invalid for next-print markout,
invalid for micro-return estimation,
valid for reconciliation.

Rule 3: use publication-time as-of logic for live realism

If the controller could not observe a report yet, the label should not pretend it could.

Late-report handling and sale-condition handling belong together.

Rule 4: corrections should version labels, not silently replace them

Keep both:

live-as-of label state,
hindsight-repaired label state.

Otherwise your research will quietly train on a better tape than production ever had.

Rule 5: auction and special-print windows deserve their own regime tags

Open, close, halts, and special reporting windows should not be shoved into “normal microstructure” buckets.

Rule 6: do not let benchmark contamination masquerade as impact

If raw last and filtered benchmark diverge materially, attribution must split:

market impact,
urgency / completion cost,
benchmark contamination.

Otherwise operations teams will chase the wrong problem.

Common anti-patterns

Using “latest trade” as the default benchmark for everything.
Counting all prints equally in next-print or next-trade markouts.
Training toxicity models on trade-to-trade returns without sale-condition gates.
Letting corrected or canceled trades silently rewrite historical labels.
Using report-arrival time or database-ingest time without as-of observability logic.
Assuming a trade that updates volume should also reset continuous fair value.
Ignoring auction / reopen / close condition bursts when comparing model vintages.
Building a single canonical tape when the real world needs multiple task-specific views.

30-day rollout plan

Week 1 — make the trade taxonomy observable

Parse and store sale-condition / modifier fields losslessly.
Add execution-time vs publication-time fields.
Tag corrections, cancels, out-of-sequence, auction-related, and special prints.
Produce a dashboard for NSPR, BCG, and CCR.

Week 2 — split benchmark streams

Build raw, filtered, and quote-based benchmark series side by side.
Replay recent TCA on all three.
Quantify where conclusions change materially.

Week 3 — retrain labels with explicit condition handling

Rebuild markout and toxicity labels using benchmark-eligible filters.
Keep live-as-of and hindsight-repaired versions.
Compare calibration and feature importance stability.

Week 4 — harden production controls

Alert on sale-condition shock bursts.
Suppress raw-last-based routing decisions when contamination gap is large.
Version the eligibility policy used for every model run.
Publish symbol buckets where special-print contamination is structurally high.

What good looks like

A production-grade slippage stack should be able to answer:

What was the last reported trade?
What was the last benchmark-eligible continuous trade?
Which intervening trades were excluded and why?
Did a special-condition print affect volume, price benchmark, both, or neither?
Would the label differ under live-as-of vs hindsight-repaired data?
How much of measured “slippage” is actually benchmark contamination?

If you cannot answer those questions, your model is probably measuring execution against a tape that is too literal and not nearly semantic enough.

Selected public references

Nasdaq Trader, UTP Vendor Alert #2015-04 — sale-condition modifier changes for Contingent Trades and Qualified Contingent Trades, including volume-only treatment for certain processing paths:
- https://www.nasdaqtrader.com/TraderNews.aspx?id=utp2015-04
Nasdaq Trader, UTDF Specification — public feed specification covering trade data feed semantics and modifier fields:
- https://www.nasdaqtrader.com/content/technicalsupport/specifications/utp/utdfspecification.pdf
SEC, Qualified Contingent Trade Exemptive Order (Release No. 34-57620) — why QCT pricing is tied to linked component relationships rather than the standalone market price of a single print:
- https://www.sec.gov/files/rules/exorders/2008/34-57620.pdf
CTA Plan, Redefined/New CTS Sale Conditions — public notice that CTA sale-condition processing rules were revised in coordination with UTP-era changes:
- https://www.ctaplan.com/publicdocs/ctaplan/notifications/announcements/trader-update/6113.pdf
FINRA, Trade Report Modifiers and Applicability of LULD Price Bands — public guidance showing that modifier handling affects whether a trade updates last sale and how it should be interpreted:
- https://www.finra.org/filing-reporting/trf/trade-report-modifiers-and-applicability-limit-uplimit-down-luld-price-bands
FINRA, Trade Reporting FAQ — modifier examples for prior reference price / stopped stock style reporting and related trade-report semantics:
- https://www.finra.org/filing-reporting/market-transparency-reporting/trade-reporting-faq

Bottom line

Not every print should have equal voting rights in your slippage model.

Some trades are excellent references for the live continuous market. Others are real and important, but meaningful in a different way:

for volume,
for reporting,
for reconciliation,
for special-condition transparency,
for post-trade audit.

The expensive mistake is collapsing all of them into one “last sale” reality.

Execution analytics gets cleaner when you stop asking only:

what was the last print?

and start asking:

what was the last print that was eligible to mean what I am about to claim it means?

That question sounds pedantic.

It saves real money.