Linux Packet Timestamping Playbook (SO_TIMESTAMPING + PTP, Practical)

Date: 2026-03-19
Category: knowledge

Why this matters

If you care about microsecond-level latency (execution gateways, market-data collectors, telemetry ingest), you need to separate:

network path latency,
application scheduling delay,
clock error/drift.

Without accurate packet timestamps, these collapse into one noisy number and you can’t debug what really got slower.

1) Clock model first (before socket code)

Treat timestamps as a clock-discipline problem first, API problem second.

You usually have:

PHC (PTP Hardware Clock) on NIC,
system clock (CLOCK_REALTIME / CLOCK_TAI),
NIC hardware RX/TX timestamps,
software fallback timestamps.

Practical rule:

Discipline PHC and system clock (e.g., ptp4l + phc2sys).
Prefer CLOCK_TAI for monotonic-ish wall timeline without leap-second surprise.
Tag every metric with clock domain so you never mix apples and oranges.

2) Capability check: confirm NIC/driver supports what you need

Before implementation, verify hardware capabilities:

ethtool -T <iface> for timestamping capabilities,
check RX/TX hardware timestamp support,
confirm driver + firmware version actually enables required modes.

Many outages start with: “code expects HW TX timestamps, NIC silently gives SW timestamps.”

3) Pick timestamping mode by use case

Mode A: software timestamps only (baseline)

easiest to enable,
includes kernel scheduling/jitter,
good for coarse profiling, not exchange-grade network attribution.

Mode B: RX hardware timestamps

better one-way receive timing,
useful for market-data ingest quality and feed-gap diagnostics.

Mode C: TX hardware timestamps

crucial for send-path attribution,
retrieve via socket error queue,
needed for precise “send intent vs wire egress” decomposition.

Use software timestamps as fallback telemetry, not as primary truth for low-latency claims.

4) Socket API essentials (SO_TIMESTAMPING)

Enable timestamping with SO_TIMESTAMPING and relevant SOF_TIMESTAMPING_* flags (hardware/software, RX/TX, raw hardware).

Operational essentials:

TX timestamps arrive on error queue → use recvmsg(..., MSG_ERRQUEUE) path.
Parse control messages (SCM_TIMESTAMPING) safely.
Preserve packet correlation (e.g., ID/sequence) so timestamp can be mapped to order/request.

If your send path is async/batched, correlation IDs are mandatory or your timing data becomes unusable.

5) PTP discipline: keep clocks trustworthy

Typical production pattern:

ptp4l: sync PHC to grandmaster,
phc2sys: sync system clock to PHC (or inverse by policy),
monitor offset and frequency adjustments.

Track continuously:

PHC↔system offset,
RMS offset / max offset,
clock-step events,
sync-state transitions (locked/holdover/fault).

Any serious timestamping dashboard should show sync health next to latency metrics.

6) Common failure modes

Clock-domain mixing
- comparing hardware timestamps with app CLOCK_REALTIME without normalization.
Silent fallback to software timestamps
- capability drift after driver/firmware/kernel update.
Ignoring TX error queue consumption
- timestamps drop under load, leading to biased samples.
Offload interactions misunderstood
- GRO/LRO/TSO/GSO can distort packet-level assumptions.
Queue/CPU affinity drift
- packet path migrates across cores, injecting jitter unrelated to network.
No leap-second policy
- CLOCK_REALTIME adjustments create phantom spikes if not handled carefully.

7) Observability schema (minimum)

Per packet/order (sampled if needed):

app enqueue/send intent timestamp,
kernel/NIC TX timestamp,
RX timestamp,
clock domain tag,
correlation ID.

Aggregate metrics:

p50/p95/p99 for app→TX_HW delta,
RX_HW→handler delta,
timestamp sample-loss rate,
PHC↔system offset distribution,
sync lock uptime.

Define SLOs separately for:

transport path,
host scheduling path,
clock integrity.

8) Rollout plan (safe)

Lab validation
- verify capability matrix per NIC model + driver version.
Shadow mode
- collect timestamp telemetry without changing routing/execution decisions.
Drift alarms
- alert on PHC/system offset and timestamp fallback rate.
Guarded activation
- enable decision logic that depends on timestamps only after data quality gates pass.
Regression gate
- block kernel/driver upgrades unless timestamp integrity tests pass.

9) Practical checklist

ethtool -T confirms required HW timestamping modes
PTP sync health dashboard exists (offset/state)
TX MSG_ERRQUEUE path implemented and load-tested
Timestamp correlation IDs are lossless under bursts
Clock-domain normalization is explicit and tested
Fallback-to-software path is measurable and alerting
Upgrade runbook includes timestamp integrity validation

References

Linux timestamping documentation:
https://www.kernel.org/doc/html/latest/networking/timestamping.html
socket(7) / recvmsg(2) / ancillary data basics:
https://man7.org/linux/man-pages/man7/socket.7.html
LinuxPTP project (ptp4l, phc2sys):
https://linuxptp.sourceforge.net/
ethtool timestamping capability reference:
https://man7.org/linux/man-pages/man8/ethtool.8.html