SO_REUSEPORT + eBPF Socket Steering Playbook (Hot-Restart Safe)

Date: 2026-03-16
Category: knowledge

Why this matters

If you run multi-worker network services (gateways, market-data ingest, collectors, APIs), SO_REUSEPORT is a standard way to spread traffic across workers.

But default behavior is not always enough:

Hash-based steering can create skew for certain traffic shapes (e.g., a few heavy UDP exporters).
Rolling restarts can drop in-flight handshake state if listener migration is not handled.
“Looks balanced on average” can still hide per-worker drops and tail latency blowups.

This playbook turns SO_REUSEPORT from a socket option into an operated control surface.

1) Baseline model: what plain SO_REUSEPORT actually does

With SO_REUSEPORT, multiple sockets can bind/listen on the same IP:port (Linux 3.9+).

Default selection is kernel hash-based (4-tuple driven: src IP/port + dst IP/port), which is fast and usually good enough.

Important version landmarks:

SO_REUSEPORT: Linux 3.9
SO_ATTACH_REUSEPORT_EBPF for UDP: Linux 4.5
SO_ATTACH_REUSEPORT_EBPF for TCP: Linux 4.6
BPF_PROG_TYPE_SK_REUSEPORT: Linux 4.19
Reuseport migration improvements (SELECT_OR_MIGRATE context): Linux 5.14

Operational implication: if you need programmable steering + safer restarts, kernel version is a hard prerequisite, not a tuning detail.

2) When default hashing is enough vs when eBPF is justified

Use plain SO_REUSEPORT when:

traffic sources are naturally diverse,
per-worker load variance is low,
restart behavior is already acceptable,
and tail SLOs are stable.

Consider SO_ATTACH_REUSEPORT_EBPF when:

Skewed source distributions
- few heavy senders dominate one/few workers.
Topology-aware steering needs
- you want custom policy (e.g., weighted, random, migration-aware).
Hot-restart safety
- you need better control over listener transitions.
Measurable imbalance pain
- drops, queue buildup, p99 divergence are recurring.

Rule: don’t add eBPF “because fancy.” Add it when you can state the failure mode in one sentence.

3) Selection semantics you must not forget

For reuseport BPF programs, socket selection is group-index based.

Key details:

Program must ultimately select valid socket index/mapping target.
Invalid selection falls back to plain reuseport mechanism.
Socket ordering can change when sockets are closed (group compaction behavior).
New sockets in group inherit current reuseport program.

Practical consequence: avoid hard-coding assumptions like “worker 3 is always index 3 forever.”

If your rollout process depends on stable identity, use explicit map-driven steering (BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) and treat indexing as mutable runtime state.

4) Restart safety: the subtle failure mode

Historically, reuseport groups could lose in-flight TCP handshake-related state when a listener closed during restart/reload windows.

Modern migration support (5.14+) improves this via migration-aware paths (BPF_SK_REUSEPORT_SELECT_OR_MIGRATE), and non-eBPF path may rely on net.ipv4.tcp_migrate_req behavior.

Operational guidance:

If hot-restart correctness is critical, test restart under SYN flood / handshake churn, not only idle canaries.
Validate no spike in:
- reset responses,
- accept failures,
- connection setup latency,
- handshake drop counters.

“Graceful restart” claims are meaningless unless measured during contention.

5) Metrics that actually reveal reuseport health

Track at least:

Distribution / balance

per-worker packets accepted (UDP) / accepts (TCP)
per-worker bytes
imbalance ratio: max(worker_rate) / median(worker_rate)

Loss / pressure

per-worker socket drops / rcvbuf errors
per-worker queue depth (accept queue / app queue)
backlog overflow indicators

User-visible outcomes

p95/p99 request latency by worker or shard
connection setup latency distribution
timeout/retry rate

Restart correctness

rst spikes during deploy windows
handshake success ratio pre/during/post deploy
migration-triggered error counters (if exposed)

Averages hide this problem. Keep per-worker visibility.

6) Control states (recommended)

Use a small state machine instead of ad-hoc tweaks:

BALANCED: default steering policy
SKEW_WARN: imbalance exceeds threshold, monitor tighter
SKEW_ACTIVE: switch to corrective steering profile
RESTART_PROTECT: restart window with migration-aware policy and stricter alerting
SAFE_FALLBACK: disable custom policy, revert to plain hash if errors spike

Example trigger ideas:

imbalance_ratio > 1.8 for 60s -> SKEW_WARN
imbalance_ratio > 2.2 + drop_rate rising -> SKEW_ACTIVE
deploy start -> RESTART_PROTECT
eBPF attach/map update failures or verifier/runtime anomalies -> SAFE_FALLBACK

Design principle: always keep a one-step path to known-good kernel default behavior.

7) Rollout pattern that avoids self-inflicted outages

Observe-only phase
- Keep plain reuseport.
- Build per-worker baseline for balance, drops, p99.
Shadow policy validation
- Compute what custom steering would choose (without enforcing).
- Compare expected vs actual skew reduction potential.
Canary apply
- Small subset of hosts/ports.
- Hard rollback if p99, drops, or setup failures regress.
Progressive expansion
- Scale by host group, not all at once.
- Freeze rollout during known burst windows.
Restart stress validation
- Intentionally deploy under load.
- Require restart-window SLO pass before full adoption.

8) Common mistakes

No kernel capability gate
- Team assumes feature exists everywhere; mixed kernels break assumptions.
Treating per-worker skew as noise
- System looks healthy until one worker saturates and tails explode.
No fallback contract
- Custom policy fails with no immediate downgrade path.
Testing only steady-state traffic
- Restart and burst paths are where hidden defects appear.
Assuming socket index stability
- Group membership changes reorder effective indexing.

9) 30-minute incident runbook (reuseport imbalance spike)

Confirm symptom:
- imbalance ratio, per-worker drops, p99 skew.
Check whether issue coincides with deploy/restart window.
If custom steering enabled:
- switch to fallback/plain hash policy.
Verify immediate impact:
- drop rate down? queue down? p99 improving?
If yes, hold fallback and capture forensic bundle:
- kernel version, policy version, map/program update logs.
If no, investigate non-steering bottlenecks:
- NIC queue pinning, CPU saturation, app-level lock contention.
Post-incident:
- tune thresholds/hysteresis,
- add replay or load-test case for reproduced failure.

10) Minimal decision matrix

Need only basic multithread listen scaling? -> SO_REUSEPORT default.
Seeing repeatable skew with concentrated flows? -> add eBPF steering candidate.
Need robust hot restarts under load? -> migration-aware design and restart stress gates.
Not ready to operate per-worker metrics + rollback? -> stay with default for now.

The winning setup is not “most programmable.” It is “easiest to keep correct at 3 a.m.”

References

Linux socket(7) (SO_REUSEPORT / SO_ATTACH_REUSEPORT_EBPF semantics):
https://man7.org/linux/man-pages/man7/socket.7.html
eBPF program type BPF_PROG_TYPE_SK_REUSEPORT (context + migration notes):
https://docs.ebpf.io/linux/program-type/BPF_PROG_TYPE_SK_REUSEPORT/
eBPF helper bpf_sk_select_reuseport:
https://docs.ebpf.io/linux/helper-function/bpf_sk_select_reuseport/
Practical UDP skew example with reuseport eBPF in Go (case study):
https://vincent.bernat.ch/en/blog/2026-reuseport-ebpf-go