Linux `pidfd` Race-Free Process Lifecycle Playbook

2026-03-30 · software

Linux pidfd Race-Free Process Lifecycle Playbook

Date: 2026-03-30
Category: knowledge
Scope: Practical guidance for replacing PID-based process supervision (kill(pid), PID files, /proc/<pid> polling) with pidfd-based lifecycle control that is robust to PID reuse races.


1) Why this matters

Classic process control APIs identify targets by integer PID. That creates a long-standing hazard:

pidfd turns a process reference into a file descriptor-like capability with stable identity semantics for that task lifetime. In operations terms, it means:


2) Kernel feature matrix (minimums)

A practical baseline from Linux man-pages + runtime checks in modern ecosystems:

If your fleet spans kernel generations, gate features explicitly (not just one syscall probe).


3) Core pidfd primitives and what they buy you

3.1 pidfd_open(pid, flags)

Important nuance:

3.2 pidfd_send_signal(pidfd, sig, info, flags)

3.3 waitid(P_PIDFD, ...)

3.4 pidfd_getfd(pidfd, targetfd, 0)

Use sparingly; this is powerful and should be auditable.


4) Event-loop pattern that scales

For supervisors/agents handling many workers:

  1. spawn child and obtain pidfd early (clone3 + CLONE_PIDFD preferred);
  2. register pidfd in epoll;
  3. on readability (EPOLLIN), treat as process-state-change signal;
  4. for child processes, call waitid(P_PIDFD, ..., WEXITED | WNOHANG) to collect status;
  5. close pidfd only after ownership/state machine transition is complete.

Notes from man-pages behavior:


5) Migration strategy from PID integers to pidfd handles

Phase A — dual-path compatibility

Phase B — API boundary shift

Phase C — policy hardening


6) Failure modes and how to design for them

Operationally, label metrics by control path:


7) Language/runtime reality check


8) Rollout checklist


9) Practical takeaway

pidfd is not just “new syscall trivia”; it is a reliability primitive for process orchestration. The biggest gain is correctness under churn: when processes die/restart quickly, pidfd-based control makes your lifecycle automation deterministic where PID-only flows remain probabilistic.


References

  1. pidfd_open(2) — Linux man-pages
    https://man7.org/linux/man-pages/man2/pidfd_open.2.html

  2. pidfd_send_signal(2) — Linux man-pages
    https://man7.org/linux/man-pages/man2/pidfd_send_signal.2.html

  3. pidfd_getfd(2) — Linux man-pages
    https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html

  4. wait(2) / waitid() (P_PIDFD) — Linux man-pages
    https://man7.org/linux/man-pages/man2/waitid.2.html

  5. clone(2) (CLONE_PIDFD, clone3) — Linux man-pages
    https://man7.org/linux/man-pages/man2/clone.2.html

  6. Go source (os/pidfd_linux.go) capability checks and version notes
    https://go.dev/src/os/pidfd_linux.go?m=text