Drop Copy + Post-Trade Reconciliation Break-Management Playbook
Date: 2026-02-26 (KST)
TL;DR
Execution quality is not only about slippage at fill time. A desk can trade “well” intraday and still lose money (or create compliance risk) if post-trade records drift across OMS, broker drop copy, and clearing confirmations.
Use a three-layer reconciliation loop:
- Real-time sanity (seconds): catch obvious mismatches early
- Intraday break queue (minutes): classify and triage unresolved differences
- End-of-day hard close (hours): force economic truth + audit trail before settlement windows
Treat breaks as an operational risk budget, not back-office noise.
1) Why this matters more in a tighter settlement world
As settlement cycles get shorter (e.g., many markets moving toward tighter post-trade timelines), tolerance for manual repair shrinks.
Common failure pattern:
- Front office assumes fills are final after ACK
- Middle/back office discovers quantity/fee/counterparty mismatches late
- Team burns time in manual chase during narrow cutoff windows
- Economic exposure and regulatory reporting quality degrade
Practical implication: reconciliation latency is now a trading-system metric.
2) Canonical event model (single source of operational truth)
Build a normalized internal schema before matching anything. Do not reconcile directly across raw vendor payloads.
2.1 Required fields (minimum)
trade_datesymbolsideexec_qtyexec_priceexec_id(venue/broker execution identifier)order_id/cl_ord_idlineagebrokervenuecurrencyfees_commissiongross_amount/net_amountevent_ts_exchangeevent_ts_gatewayevent_ts_ingestcorrection_seq(for cancel/correct chains)
2.2 Key normalization rules
- Normalize side representation (
BUY/SELLonly) - Normalize quantity units (shares/contracts/lots)
- Normalize price precision and currency scales
- Convert all timestamps to UTC internally, keep original timezone metadata
- Preserve raw payload hash for audit replay
If you skip normalization, your “break rate” mostly measures schema inconsistency.
3) Matching strategy: deterministic first, fuzzy second
3.1 Tier-1 deterministic match
Primary key (example priority):
broker + exec_idvenue_trade_idorder_id + fill_seq
If exact key match exists, reconcile economics (qty/price/fees/notional) with tolerance rules.
3.2 Tier-2 constrained fuzzy match
When deterministic key is missing (common in partial legacy integrations), match on:
- same symbol + side
- quantity within expected lot granularity
- timestamp window (e.g., ±3s to ±30s, venue-dependent)
- price within tick-based tolerance
All fuzzy matches must carry a confidence score and human-review flag above a risk threshold.
3.3 Tier-3 unresolved breaks
Anything unresolved enters a break queue with explicit owner and SLA clock. No “silent pending” state.
4) Break taxonomy (so fixes are actionable)
Tag each break by type. One break can have multiple tags.
Economic breaks
- qty mismatch
- price mismatch
- fee/tax mismatch
- net amount mismatch
Lifecycle breaks
- missing cancel/correct chain
- duplicate execution
- out-of-order event application
Reference-data breaks
- symbol mapping/corporate-action drift
- account/book mismatch
- venue code mismatch
Timing breaks
- late drop copy arrival
- clock skew causing window miss
- settlement-date derivation mismatch
The taxonomy should map directly to routing: trading, middle office, reference data, or infra.
5) Reconciliation state machine
Use an explicit state machine instead of ad-hoc status text.
NEW -> MATCHED_EXACT | MATCHED_FUZZY | BREAK_OPEN -> BREAK_ACKED -> RESOLVED | ESCALATED -> CLOSED
Recommended behavior:
MATCHED_FUZZYalways requires periodic sampling reviewBREAK_OPENauto-assigns owner/team by taxonomyESCALATEDtriggers notification if SLA breach threshold is hitCLOSEDrequires resolution reason code (DATA_FIX,COUNTERPARTY_CONFIRM,WAIVER_APPROVED, etc.)
No state transition without timestamp + actor + reason.
6) Tolerance policy (hard vs soft)
Define tolerances before incidents happen.
6.1 Hard-fail examples
- side mismatch
- symbol mismatch
- duplicate
exec_idwith different economics - opposite-sign net amount
6.2 Soft-fail examples (review queue)
- tiny fee rounding differences
- micro price precision drift under configured tick tolerance
- timestamp jitter without economic mismatch
Keep soft-fail thresholds versioned and change-controlled. If thresholds move, historical break-rate comparability must be preserved.
7) Operational SLOs (starter set)
Example desk-level SLOs:
- Real-time unreconciled notional (p95, 5-min window) below internal risk limit
- EOD unresolved break ratio < 0.10% of executions
- Economic break median resolution time < 20 minutes
- Hard-break false-positive rate < 5%
Also track by symbol-liquidity bucket, venue, and broker. Aggregate-only dashboards hide concentrated fragility.
8) Data architecture pattern (practical)
Use append-only ledgers plus materialized views:
raw_events(immutable ingestion)normalized_exec_events(canonical fields)recon_links(match candidates, confidence, rule id)recon_breaks(open/resolved lifecycle)recon_snapshot_eod(frozen daily close evidence)
Never overwrite the raw event history. Corrections should be additional events with lineage pointers.
9) SQL-style skeleton
-- 1) Candidate exact matches
insert into recon_links (left_id, right_id, match_type, confidence, rule_id)
select a.id, b.id, 'EXACT', 1.0, 'broker_exec_id'
from normalized_exec_events a
join normalized_exec_events b
on a.source = 'OMS'
and b.source = 'DROP_COPY'
and a.broker = b.broker
and a.exec_id = b.exec_id
where a.trade_date = :trade_date
and b.trade_date = :trade_date;
-- 2) Open breaks from unmatched records
insert into recon_breaks (...)
select ...
from unmatched_view
where trade_date = :trade_date;
Keep matching rules explicit and versioned (rule_id + config hash).
10) 30-minute incident runbook (when break count spikes)
Contain
- pause noncritical parameter changes
- snapshot current break queue and ingestion lag
Classify quickly
- is this mostly timing, reference-data, or economics?
- identify top broker/venue/symbol concentration
Stabilize feed health
- check ingest lag, parser error rate, clock drift, schema version rollouts
Apply temporary guardrails
- widen only timing window if economically safe
- do not relax hard economic consistency checks without explicit approval
Communicate
- send concise status: blast radius, ETA, current risk posture
Post-incident hardening
- add a regression test from captured payloads
- backfill and re-run reconciliation for affected window
11) Common anti-patterns
- “Fixing” breaks by mutating source-of-truth tables directly
- Treating duplicate execs as harmless noise
- Ignoring correction/cancel lineage and only storing latest state
- Mixing local time and UTC in matching logic
- No owner/SLA per break (queue becomes archaeology)
12) Implementation checklist (first 2 weeks)
Week 1:
- Define canonical schema + field-level contracts
- Implement deterministic matching on strongest IDs
- Stand up break taxonomy + state machine
- Build minimal dashboard (break count, unresolved age, top tags)
Week 2:
- Add constrained fuzzy match with confidence scoring
- Add SLA-based escalation notifications
- Add daily immutable reconciliation snapshot export
- Run one game-day: inject duplicate, late, and correction events
If game-day fails, do not claim reconciliation is production-ready.
13) Bottom line
A desk without strong post-trade reconciliation is running invisible leverage. Execution alpha can be real and still be erased by operational drift.
Make reconciliation fast, explicit, and auditable:
- canonical schema
- deterministic-first matching
- break taxonomy + SLA ownership
- append-only evidence trail
That turns reconciliation from cleanup work into a real risk-control system.