Linux Packet Timestamping Playbook (SO_TIMESTAMPING + PTP, Practical)
Date: 2026-03-19
Category: knowledge
Why this matters
If you care about microsecond-level latency (execution gateways, market-data collectors, telemetry ingest), you need to separate:
- network path latency,
- application scheduling delay,
- clock error/drift.
Without accurate packet timestamps, these collapse into one noisy number and you can’t debug what really got slower.
1) Clock model first (before socket code)
Treat timestamps as a clock-discipline problem first, API problem second.
You usually have:
- PHC (PTP Hardware Clock) on NIC,
- system clock (
CLOCK_REALTIME/CLOCK_TAI), - NIC hardware RX/TX timestamps,
- software fallback timestamps.
Practical rule:
- Discipline PHC and system clock (e.g.,
ptp4l+phc2sys). - Prefer
CLOCK_TAIfor monotonic-ish wall timeline without leap-second surprise. - Tag every metric with clock domain so you never mix apples and oranges.
2) Capability check: confirm NIC/driver supports what you need
Before implementation, verify hardware capabilities:
ethtool -T <iface>for timestamping capabilities,- check RX/TX hardware timestamp support,
- confirm driver + firmware version actually enables required modes.
Many outages start with: “code expects HW TX timestamps, NIC silently gives SW timestamps.”
3) Pick timestamping mode by use case
Mode A: software timestamps only (baseline)
- easiest to enable,
- includes kernel scheduling/jitter,
- good for coarse profiling, not exchange-grade network attribution.
Mode B: RX hardware timestamps
- better one-way receive timing,
- useful for market-data ingest quality and feed-gap diagnostics.
Mode C: TX hardware timestamps
- crucial for send-path attribution,
- retrieve via socket error queue,
- needed for precise “send intent vs wire egress” decomposition.
Use software timestamps as fallback telemetry, not as primary truth for low-latency claims.
4) Socket API essentials (SO_TIMESTAMPING)
Enable timestamping with SO_TIMESTAMPING and relevant SOF_TIMESTAMPING_* flags (hardware/software, RX/TX, raw hardware).
Operational essentials:
- TX timestamps arrive on error queue → use
recvmsg(..., MSG_ERRQUEUE)path. - Parse control messages (
SCM_TIMESTAMPING) safely. - Preserve packet correlation (e.g., ID/sequence) so timestamp can be mapped to order/request.
If your send path is async/batched, correlation IDs are mandatory or your timing data becomes unusable.
5) PTP discipline: keep clocks trustworthy
Typical production pattern:
ptp4l: sync PHC to grandmaster,phc2sys: sync system clock to PHC (or inverse by policy),- monitor offset and frequency adjustments.
Track continuously:
- PHC↔system offset,
- RMS offset / max offset,
- clock-step events,
- sync-state transitions (locked/holdover/fault).
Any serious timestamping dashboard should show sync health next to latency metrics.
6) Common failure modes
Clock-domain mixing
- comparing hardware timestamps with app
CLOCK_REALTIMEwithout normalization.
- comparing hardware timestamps with app
Silent fallback to software timestamps
- capability drift after driver/firmware/kernel update.
Ignoring TX error queue consumption
- timestamps drop under load, leading to biased samples.
Offload interactions misunderstood
- GRO/LRO/TSO/GSO can distort packet-level assumptions.
Queue/CPU affinity drift
- packet path migrates across cores, injecting jitter unrelated to network.
No leap-second policy
CLOCK_REALTIMEadjustments create phantom spikes if not handled carefully.
7) Observability schema (minimum)
Per packet/order (sampled if needed):
- app enqueue/send intent timestamp,
- kernel/NIC TX timestamp,
- RX timestamp,
- clock domain tag,
- correlation ID.
Aggregate metrics:
- p50/p95/p99 for app→TX_HW delta,
- RX_HW→handler delta,
- timestamp sample-loss rate,
- PHC↔system offset distribution,
- sync lock uptime.
Define SLOs separately for:
- transport path,
- host scheduling path,
- clock integrity.
8) Rollout plan (safe)
- Lab validation
- verify capability matrix per NIC model + driver version.
- Shadow mode
- collect timestamp telemetry without changing routing/execution decisions.
- Drift alarms
- alert on PHC/system offset and timestamp fallback rate.
- Guarded activation
- enable decision logic that depends on timestamps only after data quality gates pass.
- Regression gate
- block kernel/driver upgrades unless timestamp integrity tests pass.
9) Practical checklist
-
ethtool -Tconfirms required HW timestamping modes - PTP sync health dashboard exists (offset/state)
- TX
MSG_ERRQUEUEpath implemented and load-tested - Timestamp correlation IDs are lossless under bursts
- Clock-domain normalization is explicit and tested
- Fallback-to-software path is measurable and alerting
- Upgrade runbook includes timestamp integrity validation
References
- Linux timestamping documentation:
https://www.kernel.org/doc/html/latest/networking/timestamping.html socket(7)/recvmsg(2)/ ancillary data basics:
https://man7.org/linux/man-pages/man7/socket.7.html- LinuxPTP project (
ptp4l,phc2sys):
https://linuxptp.sourceforge.net/ ethtooltimestamping capability reference:
https://man7.org/linux/man-pages/man8/ethtool.8.html