Trade-Sign Classification Playbook

2026-04-08 · finance

Trade-Sign Classification Playbook

Date: 2026-04-08
Category: knowledge
Domain: finance / market microstructure / data engineering / execution analytics

Why this matters

A surprising amount of execution research quietly depends on one deceptively simple question:

Was this trade buyer-initiated or seller-initiated?

If you get that wrong, a lot of downstream analytics becomes noisy or outright misleading:

The trap is that many teams treat trade signing as solved by one canned rule. It is not.

What works tolerably well on older quote/trade feeds can break in modern electronic markets because of:

The practical goal is not academic purity. The goal is:

pick the simplest trade-signing method that matches your data quality, use case, and market structure, then validate it against something closer to ground truth before trusting any downstream model.


1) Fast mental model

There are really four different problems people blur together:

  1. True aggressor-side recovery
    Who crossed the spread in reality?

  2. Trade-level direction approximation
    Given only prints and quotes, what side is the best guess for this specific trade?

  3. Bar-level signed-volume estimation
    Over a time/volume bar, roughly how much volume was buyer- vs seller-initiated?

  4. Execution-analytics robustness
    How much sign error can your downstream signal tolerate?

That distinction matters because different tools solve different problems:

If you only remember one thing, remember this:

Do not use a bar-level signing method when you actually need per-trade labels, and do not pretend a trade-level heuristic is ground truth when feed timing is messy.


2) The hierarchy of evidence

From best to weakest practical evidence:

  1. Exchange-native aggressor flag
  2. Market-by-order / order-add-delete-match reconstruction with venue semantics
  3. Trade matched to prevailing bid/ask or midpoint using high-quality quote timing
  4. Tick-test fallback
  5. Bar-level inference like BVC

That hierarchy is brutal but useful. A lot of trouble comes from pretending level 4 or 5 is equivalent to level 1.


3) The quote rule

In one sentence

Compare the trade price to the prevailing bid-ask midpoint: above midpoint implies buyer-initiated, below midpoint implies seller-initiated.

Why it works

If a market order lifts the ask, prints tend to happen near the ask side. If a market order hits the bid, prints tend to happen near the bid side. The midpoint is the simplest separator.

Basic logic

Let:

Then:

Where it shines

Where it breaks

Practical warning

The quote rule is often the best simple default, but only if the quote you compare against is actually contemporaneous. That timing assumption is the whole game.


4) The tick test

In one sentence

If the trade price is above the previous distinct trade price, call it a buy; if below, call it a sell.

Basic logic

Compare the trade price to the most recent previous different trade price:

Why people still use it

Because it needs only trade prints. That makes it convenient for:

Where it helps

Where it breaks badly

In modern electronic trading, the tick test is often less a primary method and more an emergency fallback.


5) Lee-Ready

In one sentence

Lee-Ready combines the quote rule with a tick-test fallback, historically using a lagged quote to compensate for trade/quote timestamp mismatch.

This is the classic workhorse. It became famous because it was simple, practical, and far better than pretending every trade at the ask was always a buy and every trade at the bid was always a sell without handling ties and timing issues.

The classic intuition

Older data often had quote updates recorded with delays relative to trades. So a “current” quote in the database could actually be newer than the quote traders saw when the trade happened. Lee-Ready’s famous fix was to look at a slightly earlier quote rather than the naively contemporaneous one.

Then:

  1. compare trade price to the bid-ask midpoint from the selected quote,
  2. classify above/below midpoint as buy/sell,
  3. if the trade occurs at the midpoint, use the tick test.

Why it still matters

Because the structure is still useful:

What people get wrong

They copy the old lag mechanically.

The dangerous anti-pattern is:

“Lee-Ready means use a 5-second lag.”

No. That lag was a historical fix for specific data conditions. In modern feeds:

In some environments, blindly applying the historical lag can be worse than using a calibrated contemporaneous quote.

Practical rule

Use Lee-Ready as a framework, not as a frozen parameter choice.


6) Bulk Volume Classification (BVC)

In one sentence

BVC estimates the buy/sell split of volume over short intervals from aggregate price movement, instead of assigning an exact sign to each individual trade.

This is a very different tool. It is not trying to recover the precise aggressor side for trade number 184,723. It is trying to infer whether a short interval’s volume was mostly buyer- or seller-driven.

Why it exists

Sometimes you do not have the data quality needed for reliable trade-level classification. But you still want:

BVC is attractive because it can be data-efficient and useful when trade-level signing is messy or unnecessary.

Where it shines

Where it is the wrong tool

The key mental model

BVC is an interval estimator, not an aggressor-side oracle.

If you use it as if it were trade-level truth, you will quietly poison your downstream research.


7) Modern market structure breaks the naive versions

Trade classification got harder because markets got faster and less literal.

A) Timestamp alignment is not a footnote

This is the biggest operational issue. Your trades and quotes may be stamped by:

Even if both timestamps look precise, they may not be causally aligned.

What this breaks:

B) Midpoint and hidden executions

A midpoint print is not clean evidence of either side from price alone. That means midpoint-heavy venues or hidden-liquidity interactions create many unresolved or weakly resolved cases.

C) Odd lots distort displayed touch intuition

Displayed NBBO logic may not fully reflect the real tradeable state when odd lots or venue-specific display rules matter. A trade can look “inside” or “off touch” without being economically bizarre.

D) Fragmented markets mean multiple plausible quotes

Which quote was “the quote”? The SIP quote? the direct-feed quote? the venue-local quote? a vendor-normalized synthetic quote?

These are not equivalent under latency and fragmentation.

E) Auctions and special prints are different animals

Opening/closing crosses, halts, reopen auctions, off-book prints, derivatively priced prints, and correction/cancel conditions should not be shoved through the same classifier as ordinary continuous lit trading.

A robust pipeline often starts by excluding or separately labeling special trade conditions.


8) Decision matrix: what to use when

Use native aggressor-side flags when

This is the first choice whenever available. Do not “simplify” away better information.

Use quote rule + fallback when

For many practical research stacks, this is the default baseline.

Use Lee-Ready-style logic when

This is often the best pragmatic trade-level heuristic family.

Use tick test alone only when

It is a fallback, not a badge of toughness.

Use BVC when


9) A sane production pipeline

A robust trade-signing pipeline usually looks more like this than a single rule:

Step 1: Filter or isolate non-standard prints

Separate or drop:

Step 2: Choose the quote source deliberately

Define whether you use:

Do not leave this implicit.

Step 3: Calibrate quote alignment

Test a range of quote offsets/lags and measure classification quality on a sample with better ground truth.

Step 4: Apply midpoint/quote rule first

Use quote information when it is credible.

Step 5: Use tick test only for unresolved ties or quote-missing cases

Do not let the fallback quietly become the dominant method unless you intended that.

Step 6: Emit confidence / method metadata

For every signed trade, store at least:

This is incredibly useful later when a model behaves strangely.


10) The most useful validation questions

Before trusting signed-flow analytics, ask:

A) What is the ground truth subset?

Examples:

Without some benchmark subset, you are grading your classifier with vibes.

B) How sensitive is performance to quote lag?

If accuracy changes a lot when the lag moves slightly, your alignment problem is not solved.

C) Where does the classifier fail?

Break results out by:

D) How much fallback is happening?

If most trades are being signed by the tick test, your quote path is weaker than you think.

E) What downstream metrics are fragile to sign noise?

Some signals survive moderate misclassification. Others collapse. You need to know which kind of signal you are building.


11) Common failure modes

Failure mode 1: Hardcoding the historical Lee-Ready lag

This is probably the most common unforced error. What fixed one old dataset can badly misalign another.

Failure mode 2: Mixing venues without venue-aware quote logic

If you pool prints across venues but use one synthetic quote state blindly, you can manufacture sign noise.

Failure mode 3: Treating auction prints like normal continuous prints

This contaminates order-flow metrics around the open, close, halts, and rebalance windows.

Failure mode 4: Using BVC for trade-level models

BVC is powerful in the right place and quietly destructive in the wrong one.

Failure mode 5: Ignoring unresolved/unknown cases

Unknown is a valid label. Forcing a weak guess can be worse than carrying uncertainty explicitly.

Failure mode 6: Not storing the classifier provenance

Months later, someone asks why the toxicity feature shifted. If you did not store method, lag, and filters, the answer becomes archaeology.


12) A practical default for most research teams

If you have decent top-of-book quote data but no native aggressor flag, a good default is:

  1. filter out non-standard trade conditions,
  2. choose a deliberate quote source,
  3. calibrate a small set of candidate quote lags,
  4. apply midpoint/quote classification,
  5. use tick test only for midpoint ties or quote-missing cases,
  6. keep an unknown state when confidence is weak,
  7. validate by venue/liquidity/time-of-day,
  8. and only use BVC for bar-level features where trade-level precision is not required.

That is not the fanciest possible pipeline. It is just the one least likely to fool you early.


13) When to invest in something more advanced

Move beyond basic heuristics when:

At that point, the right answer is often:


14) Bottom line

Trade signing is one of those market-microstructure chores that looks boring right up until it ruins a model.

The practical summary is:

If your signed-flow feature feels mysteriously unstable, the first suspect should often be the classifier, not the alpha.


Pointers for deeper reading

Classic and commonly cited references to revisit:

Those papers are worth reading not because they give one eternal answer, but because they teach the right habit:

treat trade classification as a data-quality problem first, and an algorithm choice second.