Linux io_uring Networking (Multishot + Provided Buffers + Zero-Copy Send) — Practical Playbook

Date: 2026-03-24
Category: knowledge
Scope: Production-oriented guidance for building low-latency network servers with io_uring without accidental reordering, buffer-lifetime bugs, or CQ overflow surprises.

1) Why this matters

Most teams adopt io_uring for one reason: fewer syscalls and better tail latency under load.

For networking specifically, the big unlocks are:

Multishot accept/recv: fewer repost loops.
Provided buffer rings: explicit buffer ownership and better in-flight scaling.
Send bundles + send buffer select: more efficient TX with less kernel round-trip overhead.
Zero-copy send (IORING_OP_SEND_ZC): reduced copy cost when payloads are big enough and lifetime is managed correctly.

But these features are easy to misuse. The dangerous pattern is chasing throughput and accidentally breaking ordering or reuse safety.

2) Non-negotiable mental model

io_uring is async and completion order is not guaranteed by submission order.

For stream sockets, treat this as an operational rule:

Never overlap multiple sends on the same socket unless your design explicitly preserves order (e.g., provided-buffer send pipeline).
Never overlap multiple receives on the same socket unless using a safe multishot design with well-defined buffer ownership.

This is not theory; io_uring(7) explicitly warns that background poll arming and internal behavior can reorder execution/completion.

3) Feature baseline (kernel-aware planning)

Use this as a minimum compatibility checklist:

Provided buffer ring registration (io_uring_register_buf_ring) — available since 5.19.
Multishot accept — available since 5.19.
Multishot recv — available since 6.0.
Recv/send bundles + send buffer selection feature flag (IORING_FEAT_SEND_BUF_SELECT) — available since 6.10.
Incremental provided-buffer consumption (IOU_PBUF_RING_INC) — available since 6.12.

Operationally: treat kernel version as part of your runtime feature flags, not just an infra detail.

4) Recommended architecture

4.1 Ring model

Prefer per-core (or per-worker) ring ownership over a giant shared ring. You get:

less cross-thread coordination,
clearer ownership for connection/buffer lifetimes,
simpler backpressure.

4.2 Submission mode

IORING_SETUP_SQPOLL can reduce syscall overhead, but it is not a universal “faster” switch.

Use it when:

request rate is high and steady,
you can keep the poll thread busy,
CPU budget can absorb dedicated polling behavior.

Avoid defaulting to it for bursty/idle-heavy workloads where wake/sleep churn dominates.

4.3 Buffer ownership

Adopt provided buffers early for RX/TX pipelines. It makes ownership explicit and supports more in-flight operations than naive per-request heap buffers.

5) Safe receive pipeline (practical)

5.1 Multishot recv requirements that are easy to miss

For io_uring_prep_recv_multishot():

len must be 0,
IOSQE_BUFFER_SELECT must be set,
MSG_WAITALL must not be set,
each CQE must be checked for IORING_CQE_F_MORE; if absent, repost a new multishot recv.

5.2 Provided buffer ring hygiene

For each buffer group (bgid):

allocate page-aligned ring memory,
initialize ring with helper APIs,
track ring head/tail carefully,
alert on ring starvation (available buffers near zero).

If CQE has IORING_CQE_F_BUFFER, extract the selected buffer ID and return/recycle only after app-level processing is done.

5.3 CQ overflow risk

Multishot designs can flood CQ under burst traffic if consumer pace lags. Budget CQ depth with burst headroom, not average traffic.

6) Safe send pipeline (practical)

6.1 Classic send vs bundle send

Bundle send (io_uring_prep_send_bundle) with provided buffers can reduce per-chunk overhead and preserve stronger sequencing in a pipeline model.

Key details:

set IOSQE_BUFFER_SELECT,
set buf_group,
for current behavior, pass len=0 (otherwise -EINVAL),
CQE res = total bytes sent, CQE buffer flag points to starting buffer ID.

6.2 Zero-copy send (`SEND_ZC`) lifecycle

io_uring_prep_send_zc() typically emits two CQEs:

send result CQE, often with IORING_CQE_F_MORE,
notification CQE with IORING_CQE_F_NOTIF meaning buffer memory is now safe to reuse.

Critical rule:

Do not reuse/free payload memory until notification CQE confirms completion.

If you enable IORING_SEND_ZC_REPORT_USAGE, notification CQE reports how much was copied vs truly zero-copy (great for real-world effectiveness tracking).

7) Version-gated rollout strategy

Phase A — correctness first

single outstanding recv + single outstanding send per socket,
no ZC,
explicit state machine and deterministic tests.

Phase B — multishot receive

enable multishot accept/recv,
add CQE flag assertions (F_MORE, F_BUFFER),
verify no buffer leaks under disconnect storms.

Phase C — provided-buffer send / bundle send

gate by IORING_FEAT_SEND_BUF_SELECT,
compare syscall rate, p99 latency, and reorder defects.

Phase D — `SEND_ZC`

start on large payload classes only,
track notif lag, copied-vs-zc ratio, ENOMEM frequency,
fallback to normal send automatically on adverse kernels/NIC paths.

8) Observability checklist (must-have)

At minimum, export:

CQ depth usage %, CQ overflow/error counters,
multishot repost count (unexpected drops of F_MORE),
provided-buffer ring available count by bgid,
send_zc notification lag (submit → notif),
copied bytes ratio when IORING_SEND_ZC_REPORT_USAGE enabled,
per-socket outstanding send/recv invariant violations.

Good SLO guardrail:

“No CQ overflow” and “No illegal buffer reuse” should be hard errors, not soft warnings.

9) Common failure patterns

Assuming FIFO completion equals FIFO wire behavior
Fix: explicit per-socket sequencing discipline.
Reusing ZC buffer after first CQE
Fix: wait for notification CQE (F_NOTIF).
Multishot silently stopped
Fix: always check F_MORE, repost immediately.
Provided-buffer starvation
Fix: ring refill thresholds + alerts + backpressure.
Kernel-feature mismatch
Fix: runtime capability probing and feature gates.

10) Bottom line

io_uring networking wins come less from one magic flag and more from disciplined ownership:

socket-level sequencing,
buffer-lifetime correctness,
version-aware feature gating,
CQ/ring observability.

If you treat multishot/provided-buffer/ZC as a cohesive pipeline design instead of independent toggles, you can usually get both lower overhead and safer tail behavior.

References

io_uring overview and ordering cautions: https://man7.org/linux/man-pages/man7/io_uring.7.html
Ring setup and SQPOLL behavior/privilege notes: https://man7.org/linux/man-pages/man2/io_uring_setup.2.html
Multishot recv semantics and constraints: https://man7.org/linux/man-pages/man3/io_uring_prep_recv_multishot.3.html
Multishot accept semantics: https://man7.org/linux/man-pages/man3/io_uring_prep_multishot_accept.3.html
Provided buffer ring registration/details: https://man7.org/linux/man-pages/man3/io_uring_register_buf_ring.3.html
Send / send bundle with provided buffers: https://man7.org/linux/man-pages/man3/io_uring_prep_send_bundle.3.html
Zero-copy send semantics (SEND_ZC): https://man7.org/linux/man-pages/man3/io_uring_prep_send_zc.3.html
io_uring zero-copy RX (kernel docs): https://docs.kernel.org/networking/iou-zcrx.html

Linux io_uring Networking (Multishot + Provided Buffers + Zero-Copy Send) — Practical Playbook

Linux io_uring Networking (Multishot + Provided Buffers + Zero-Copy Send) — Practical Playbook

1) Why this matters

2) Non-negotiable mental model

3) Feature baseline (kernel-aware planning)

4) Recommended architecture

4.1 Ring model

4.2 Submission mode

4.3 Buffer ownership

5) Safe receive pipeline (practical)

5.1 Multishot recv requirements that are easy to miss

5.2 Provided buffer ring hygiene

5.3 CQ overflow risk

6) Safe send pipeline (practical)

6.1 Classic send vs bundle send

6.2 Zero-copy send (SEND_ZC) lifecycle

7) Version-gated rollout strategy

Phase A — correctness first

Phase B — multishot receive

Phase C — provided-buffer send / bundle send

Phase D — SEND_ZC

8) Observability checklist (must-have)

9) Common failure patterns

10) Bottom line

References

6.2 Zero-copy send (`SEND_ZC`) lifecycle

Phase D — `SEND_ZC`