Linux Deterministic Packet Launch Playbook: SO_TXTIME + ETF + TAPRIO (2026)
TL;DR
- If you need packets to leave at precise times (not just average pacing), Linux gives you a practical stack: SO_TXTIME + ETF, optionally with TAPRIO and NIC offload.
- Start simple: mqprio → ETF + app-level SO_TXTIME/SCM_TXTIME.
- Use CLOCK_TAI end-to-end and treat clock discipline (PTP/NTP + PHC sync) as first-class infra.
- Most failures come from four things: clock mismatch, txtime in the past, delta too small, and not draining the error queue.
1) What this is for
This stack is useful when you care about time-of-transmission determinism, such as:
- TSN-style bounded-latency flows
- deterministic control traffic
- low-jitter packet release experiments
- scheduled traffic windows by class
It is not a universal replacement for rate pacing. If your target is just smooth bandwidth, fq/fq_codel pacing is usually simpler.
2) Building blocks
A. SO_TXTIME + SCM_TXTIME (application layer)
Application sets socket policy via SO_TXTIME and per-packet launch timestamps via SCM_TXTIME in sendmsg().
Typical sock_txtime fields:
clockid(commonlyCLOCK_TAI)flags:SOF_TXTIME_DEADLINE_MODESOF_TXTIME_REPORT_ERRORS
SOF_TXTIME_REPORT_ERRORS is important in production because missed/invalid launch times are surfaced via error queue.
B. ETF qdisc (kernel scheduler)
ETF (Earliest TxTime First):
- sorts packets by earliest txtime (rb-tree ordering)
- wakes up at
next_txtime - delta - can operate in strict time mode or deadline mode
- drops packets whose txtime is already in the past / expired in queue
Core knobs:
clockiddelta(scheduler latency fudge factor)deadline_modeoffload(NIC LaunchTime / Time-Based Scheduling when supported)skip_sock_check(for kernel-generated txtime paths; use carefully)
C. mqprio (class/queue mapping)
mqprio maps skb priorities to traffic classes and queue ranges. ETF is commonly attached under mqprio class handles.
D. TAPRIO (time-aware gates, optional)
TAPRIO adds cyclic gate schedules per traffic class.
flags 0x1: txtime-assist mode (TAPRIO stamps txtime, ETF orders/releases)flags 0x2: full offload mode (NIC executes gate control list)
Important: in txtime-assist mode, txtime-delay should be greater than ETF delta.
3) Deployment patterns
Pattern 1 — ETF only (quickest path)
Use when you need deterministic launch for one class without gate scheduling.
- Configure mqprio with a dedicated traffic class.
- Attach ETF under that class.
- App sets SO_TXTIME and per-packet SCM_TXTIME.
- Monitor error queue + tx timestamps.
Pattern 2 — TAPRIO txtime-assist + ETF
Use when you need class windows and software-assisted launch timing.
- Configure TAPRIO with
flags 0x1and schedule entries. - Attach ETF on exposed class queue.
- Keep
txtime-delay > etf delta. - Validate actual queueing slack under load.
Pattern 3 — TAPRIO full offload
Use only when NIC support and clock discipline are mature.
- NIC executes gate control list in hardware (
flags 0x2) - stronger determinism potential, but stricter hardware constraints
- operationally harder to debug
4) Practical configuration skeleton
Step 1: map traffic classes (mqprio)
tc qdisc add dev eth0 root handle 100: mqprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
hw 0
Step 2: attach ETF for deterministic class
tc qdisc replace dev eth0 parent 100:1 etf \
clockid CLOCK_TAI \
delta 300000 \
offload
(Use offload only if driver/NIC supports launch-time offload.)
Step 3: app socket setup (conceptual)
struct sock_txtime cfg = {
.clockid = CLOCK_TAI,
.flags = SOF_TXTIME_REPORT_ERRORS,
};
setsockopt(fd, SOL_SOCKET, SO_TXTIME, &cfg, sizeof(cfg));
Per packet:
- set
SCM_TXTIMEcmsg to absolute timestamp in selected clock domain - send via
sendmsg() - consume
MSG_ERRQUEUEfor missed/invalid tx time notifications
5) Clock discipline: the hidden make-or-break
Deterministic launch is impossible with sloppy clocking.
Minimum standards:
- one canonical time domain (
CLOCK_TAIrecommended in practice) - host clock and NIC PHC synchronization policy documented
- explicit alerting on offset/jump events
Symptoms of clock hygiene issues:
- sudden spikes in
missed txtime - bursty “late packet” drops after time correction
- inconsistent behavior across hosts with same qdisc config
6) Sizing delta and txtime-delay
delta too small → late wakeups and avoidable drops.
delta too large → extra queue residency and latency inflation.
A practical tuning loop:
- start with conservative
delta(hundreds of microseconds class) - measure scheduling miss rate + end-to-end latency
- reduce in steps until miss rate inflects
- set operating point with margin for burst/GC/IRQ noise
With TAPRIO txtime-assist:
- enforce
txtime-delay > etf delta - maintain margin for worst-case qdisc→NIC handoff latency
7) Observability and SLOs
Track these from day one:
- missed txtime count (
SO_EE_ORIGIN_TXTIME/SO_EE_CODE_TXTIME_MISSED) - invalid txtime count (
SO_EE_CODE_TXTIME_INVALID_PARAM) - tx scheduler wait (
SOF_TIMESTAMPING_TX_SCHEDvs TX software/hardware stamps) - p50/p99 launch error (
actual_tx_time - intended_txtime) - per-class drop rate under ETF/TAPRIO trees
If you do only one thing: drain and parse error queue continuously.
8) Common failure modes
Clockid mismatch between app and ETF
- ETF expects same reference clock and may drop non-compliant packets.
Past txtime under burst/backpressure
- app computes tx times too aggressively near “now”.
No error-queue consumer
- latent misses stay invisible until user-visible jitter incidents.
Blind
skip_sock_checkusage- can hide validation assumptions; use only when kernel path sets txtime.
Assuming offload == automatically better
- offload helps determinism only with compatible driver/NIC and disciplined clocks.
9) Rollout plan (operator-safe)
Phase 0: lab
- synthetic sender with controlled inter-packet schedule
- clock drift/fault injection
- baseline miss curves vs delta
Phase 1: canary class
- one host pair, one traffic class
- strict SLO for miss rate and p99 launch error
- immediate rollback to non-scheduled queue path
Phase 2: scale-out
- host cohort rollout by rack/AZ
- monitor per-host miss outliers (usually clock or CPU-noise related)
- standardize per-NIC profile (delta, offload mode, IRQ/CPU policy)
10) Decision guide
Choose SO_TXTIME + ETF when:
- you need deterministic packet launch now
- gate scheduling is not mandatory
- you want manageable operational complexity
Add TAPRIO txtime-assist when:
- you need explicit traffic windows by class
- software-assisted schedule composition is acceptable
Use TAPRIO full offload when:
- NIC support is confirmed and tested
- PTP/PHC operations are mature
- you can absorb higher debugging complexity for tighter determinism
References
- tc-etf(8): https://man7.org/linux/man-pages/man8/tc-etf.8.html
- tc-taprio(8): https://man7.org/linux/man-pages/man8/tc-taprio.8.html
- tc-mqprio(8): https://man7.org/linux/man-pages/man8/tc-mqprio.8.html
- Linux kernel selftest
so_txtime.c: https://raw.githubusercontent.com/torvalds/linux/master/tools/testing/selftests/net/so_txtime.c - Linux UAPI
net_tstamp.h: https://raw.githubusercontent.com/torvalds/linux/master/include/uapi/linux/net_tstamp.h - Linux timestamping documentation: https://docs.kernel.org/networking/timestamping.html