Trading Clock Synchronization Playbook (NTP/PTP, Compliance, and Incident Hygiene)

Date: 2026-02-25 (KST)

TL;DR

Bad timestamps are not just an audit problem — they are a trading control problem.

If your clocks drift, you get:

wrong event ordering,
misleading slippage attribution,
broken cross-system joins,
and regulatory exposure.

This playbook gives a practical way to run time as production infrastructure: regulatory target → timing budget → measurement loop → incident runbook.

1) Why this matters in live execution

In a modern execution stack (signal → router → broker/API → venue), even small clock errors can break analysis:

You misread whether delay happened before or after a venue event.
You over/underestimate adverse selection windows.
You cannot reliably reconstruct a parent order timeline.
Post-trade controls become un-auditable.

Treat time sync as a first-class SRE + risk-control domain, not an OS default.

2) Regulatory baseline you should design against

2.1 EU (MiFID II / RTS 25)

Commission Delegated Regulation (EU) 2017/574 (RTS 25) sets explicit UTC traceability and timestamp accuracy/granularity requirements.

Key thresholds from annex tables:

Trading venues
- gateway-to-gateway latency > 1 ms → max divergence 1 ms, granularity 1 ms or better
- gateway-to-gateway latency ≤ 1 ms → max divergence 100 µs, granularity 1 µs or better
Members/participants
- high-frequency algorithmic activity → max divergence 100 µs, granularity 1 µs or better
- other activity tiers can be 1 ms or 1 s depending on trading model

Practical implication: if your strategy is fast enough to care about queue priority, your time architecture must support sub-millisecond discipline.

2.2 US (FINRA / CAT context)

FINRA Rule 6820 requires business clocks to be synchronized to NIST time with:

50 ms tolerance for most electronic order-event clocks,
1 s tolerance for manual-order/allocation-only clocks,
and explicit logging/documentation of sync checks and drift events.

FINRA Regulatory Notice 16-23 explains the 1s → 50ms tightening and includes transmission delay + drift in tolerance interpretation.

3) NTP vs PTP: when to use what

3.1 NTP (Chrony) default for broad infra

For most servers/services, NTP with Chrony is the pragmatic baseline:

easier to operate,
robust over imperfect networks,
supports NTS and modern filtering.

Chrony’s own comparison highlights faster convergence and better behavior in intermittent/congested environments versus legacy ntpd defaults.

3.2 PTP for low-latency timestamp domains

Use PTP (IEEE 1588) when you need tighter and more stable offsets, especially for:

market data capture,
execution gateways,
packet timestamping NIC paths.

On Linux, typical production chain is:

ptp4l syncs PHC (NIC hardware clock) to grandmaster
phc2sys syncs CLOCK_REALTIME to PHC (or vice versa per design)

Important detail from linuxptp docs: in hardware timestamping mode, PHC and system clock timescales/offset handling must be explicit (especially around UTC offset/leap handling).

4) Architecture pattern (small quant team friendly)

Use a 3-tier model:

Source layer
- GNSS-disciplined source and/or high-quality upstream stratum
- at least 2 independent references (avoid single timing SPOF)
Distribution layer
- internal NTP/Chrony servers for general fleet
- optional PTP grandmaster/boundary path for low-latency segment
Consumer layer
- execution/mdata nodes with explicit clock role
- app-level monotonic + wall-clock separation

Design principle: timestamp as close as possible to ingress/egress edge where decisions happen, and document that point.

5) Timing budget (don’t just set “<X ms”)

Define an additive error budget per node:

total_divergence <= source_error + network_delay_asymmetry + local_clock_drift + measurement_uncertainty

Then assign hard budgets, e.g.:

source contribution: 20%
network/asymmetry: 30%
host oscillator/drift: 30%
measurement/observability error: 20%

This makes incident triage faster: you know which component can mathematically explain a breach.

6) Leap second policy (pick one, enforce everywhere)

Leap-second handling causes real outages when mixed inconsistently.

Options:

Step (strict UTC insertion)
Smear (gradual adjustment)

Google’s public guidance describes a 24h linear smear (noon-to-noon UTC). This is operationally smooth for distributed systems, but it intentionally deviates from strict UTC during smear windows.

Rule of thumb:

compliance-critical event stamping domains: prefer strict/controlled UTC policy,
internal analytics domains: smear can be acceptable,
never mix smeared and unsmeared clocks in the same event-ordering pipeline without explicit conversion metadata.

7) Minimal observability set

Collect and alert on these by host role:

clock_offset_ns (vs chosen reference)
clock_freq_ppm
last_sync_age_sec
max_offset_5m_ns, p99_offset_1h_ns
PTP state (MASTER/SLAVE, port state transitions)
source quality / source switch count

And operational signals:

leap announcement state,
step events count,
“time jumped backward” detector in app logs.

8) Incident runbook (fast path)

Trigger examples

offset breaches threshold for N minutes
sudden source flip + increasing jitter
downstream event ordering anomalies

Immediate actions

Freeze non-essential deploys touching execution path.
Mark affected timestamps with quality flag in logs/warehouse.
Compare reference hierarchy (source → distribution → host).
If needed, isolate bad source and force known-good source set.
Reconstruct impacted interval with monotonic-clock aids.

Post-incident checklist

root cause category (source/network/host/config/mixed-timescale)
whether breach crossed regulatory tolerance bucket
exact interval of unreliable ordering
preventive config/test added (not just docs)

9) Practical command-level checks (Linux)

NTP/Chrony quick checks:

chronyc tracking
chronyc sources -v
chronyc sourcestats

PTP quick checks:

ptp4l ... -m (state transitions, offsets)
phc2sys ... -m (PHC↔system alignment)
pmc queries for grandmaster/offset status

Automate periodic snapshots into a retained audit log.

10) Implementation ladder (2-week pragmatic rollout)

Inventory all timestamp-producing systems.
Classify them by required tolerance (100µs / 1ms / 50ms / 1s).
Define one canonical UTC traceability design doc.
Add offset/frequency metrics + alerts.
Dry-run leap-second policy simulation in staging.
Run quarterly “clock chaos” drill (source loss, asymmetry spike, bad NTP peer).

One-line takeaway

Time is a risk control surface: if clock discipline is explicit, measured, and rehearsed, execution analytics stay trustworthy and compliance gets boring.

References

FINRA Rule 6820 (Clock Synchronization):
- https://www.finra.org/rules-guidance/rulebooks/finra-rules/6820
FINRA Regulatory Notice 16-23:
- https://www.finra.org/rules-guidance/notices/16-23
EU RTS 25 (Commission Delegated Regulation (EU) 2017/574):
- https://www.legislation.gov.uk/eur/2017/574
chrony comparison of NTP implementations:
- https://chrony-project.org/comparison.html
linuxptp docs:
- https://linuxptp.nwtime.org/documentation/ptp4l/
- https://linuxptp.nwtime.org/documentation/phc2sys/
Google leap smear guidance:
- https://developers.google.com/time/smear
NIST leap second / UT1-UTC notes:
- https://www.nist.gov/pml/time-and-frequency-division/time-realization/leap-seconds