Kernel Bypass on Linux for Low-Latency Trading: DPDK vs AF_XDP vs io_uring Playbook

2026-03-04 · systems

Kernel Bypass on Linux for Low-Latency Trading: DPDK vs AF_XDP vs io_uring Playbook

Date: 2026-03-04
Category: knowledge

Why this matters

For execution systems, “network stack choice” is not an infra detail. It directly changes:

If you pick the wrong path, you either:

This playbook is a practical chooser between DPDK, AF_XDP, and io_uring-based networking.


60-second mental model

DPDK

AF_XDP

io_uring networking


What the kernel/docs explicitly imply

From Linux AF_XDP docs:

From DPDK PMD docs:

From io_uring docs/wiki:


Decision matrix (practical)

Choose DPDK when

  1. You need the lowest possible jitter and strictest CPU/NIC queue control.
  2. You can dedicate cores aggressively (busy polling is acceptable).
  3. You can own the ops complexity (hugepages, vfio-pci, NIC binding discipline).
  4. Your team is comfortable with deep packet-path observability and tuning.

Typical fit:

Choose AF_XDP when

  1. You need near-bypass performance but want to stay closer to Linux-native ops.
  2. You want queue-level user-space packet access without fully abandoning kernel ecosystem tooling.
  3. You can run XDP programs and manage per-queue mapping intentionally.
  4. You need a staged path from conventional Linux networking toward lower latency.

Typical fit:

Choose io_uring networking when

  1. Your bottleneck is syscall/event-loop overhead in socket workloads (not full packet-plane bypass).
  2. You need high connection scale and efficient async semantics.
  3. You want modernized networking I/O without committing to full NIC bypass architecture.

Typical fit:


Latency-vs-operability tradeoff (rule of thumb)

Don’t optimize only mean latency. Decide on:


Non-negotiable engineering checklist (any option)

  1. Core and queue ownership are explicit
    • No accidental queue sharing in hot paths.
  2. NUMA locality is enforced
    • NIC queue, CPU core, and memory pool align on same NUMA node whenever possible.
  3. Busy-poll budget is measured, not guessed
    • Trading lower latency for runaway CPU without capacity planning is a hidden outage.
  4. Backpressure policy is designed upfront
    • Drop, coalesce, or degrade mode must be deterministic.
  5. Tail metrics are first-class
    • p50 improvements with p99.9 regressions are usually a net loss for execution quality.
  6. Rollback path is one command
    • Network-path experiments without instant rollback are incident debt.

Migration ladder (safe)

  1. Baseline current socket path
    • Capture p50/p95/p99/p99.9 + CPU + drop/retry behavior per session window.
  2. Introduce io_uring for selected socket services
    • Validate event-loop simplification and syscall reduction.
  3. Pilot AF_XDP on one feed/one venue segment
    • Keep strict canary and route isolation.
  4. Escalate to DPDK only where justified by tail SLO gap
    • Not as default religion; only where measured gains beat complexity tax.

This sequencing avoids “jump to hardest architecture first” mistakes.


Benchmark design that avoids self-deception

Benchmark each candidate under:

Track at minimum:

If your benchmark excludes bursty regimes, it is not useful for trading systems.


Common footguns

  1. Comparing copy-mode AF_XDP vs tuned DPDK and calling it final
    • Ensure mode/driver assumptions are explicit.
  2. Ignoring IOMMU/VFIO implications for DPDK rollout
    • Driver binding and security/permissions model are part of production design.
  3. Treating io_uring as a drop-in “always faster epoll”
    • Gains depend on event-loop redesign, batching, and buffer strategy.
  4. Overfitting to lab RTT/traffic
    • Use session-aware market burst replay, not synthetic steady streams only.
  5. No observability parity across options
    • If one path has weaker telemetry, postmortems become guesswork.

One-page recommendation for most desks

If your team is building practical low-latency trading infra (not HFT nanosecond wars), a robust default sequence is:

Treat packet-path architecture as a portfolio of choices by service class, not a single ideology.


References

  1. Linux Kernel Docs — AF_XDP
    https://docs.kernel.org/networking/af_xdp.html
  2. Linux Kernel 4.18 notes (AF_XDP introduction context)
    https://kernelnewbies.org/Linux_4.18
  3. DPDK Programmer’s Guide — Poll Mode Driver
    https://doc.dpdk.org/guides-24.03/prog_guide/poll_mode_drv.html
  4. DPDK Linux GSG — System Requirements (hugepages, kernel baseline)
    https://doc.dpdk.org/guides/linux_gsg/sys_reqs.html
  5. DPDK Linux GSG — Linux Drivers (vfio-pci, binding model)
    https://doc.dpdk.org/guides/linux_gsg/linux_drivers.html
  6. io_uring Linux man page
    https://man7.org/linux/man-pages/man7/io_uring.7.html
  7. liburing wiki — io_uring and networking in 2023
    https://github.com/axboe/liburing/wiki/io_uring-and-networking-in-2023
  8. LWN — Accelerating networking with AF_XDP
    https://lwn.net/Articles/750845/

One-sentence takeaway

Pick DPDK / AF_XDP / io_uring per service-class tail SLO and ops maturity: optimize p99.9 and recovery behavior, not just headline microbench latency.