User-Space Dynamic Tracing Playbook (USDT + Uprobes + eBPF)
Date: 2026-03-25
Category: knowledge
Scope: Practical, production-safe tracing for user-space services without redeploying application code.
1) Why this matters
When incidents happen in production, teams often face a bad tradeoff:
- add logs and redeploy (slow, risky), or
- guess from incomplete metrics (fast, wrong).
USDT and uprobes provide a third path: attach observability at runtime, directly to user-space execution points, with bounded blast radius.
Core principle: treat dynamic tracing as an operational control surface, not an emergency-only hack.
2) Mental model: USDT vs uprobe
A) Uprobe / uretprobe
- What it is: dynamic probe on a user-space function entry/return.
- Best for: third-party binaries, libc/runtime hotspots, no source changes.
- Risk profile: flexible but ABI/symbol fragile if binaries change.
B) USDT (User Statically Defined Tracing)
- What it is: probe points intentionally compiled into code (often via SDT/DTrace macros).
- Best for: stable semantic events (state transitions, queue checkpoints, protocol boundaries).
- Risk profile: low overhead when disabled; higher semantic reliability once event contracts are curated.
Rule of thumb:
- Need fast forensic visibility on existing binaries → start with uprobes.
- Need durable, low-ambiguity long-term signals → invest in USDT in critical code paths.
3) Tooling stack by lifecycle stage
- bpftrace (first response)
- one-liners, rapid exploration, low setup friction.
- BCC (structured scripts)
- reusable scripts and richer helper ecosystem.
- libbpf + CO-RE (productionized)
- ship stable daemons/agents with explicit contracts and CI.
Brendan Gregg’s practical guidance is still useful: start with bpftrace for short scripts, move to heavier tooling only when complexity demands it.
4) Prerequisites and environment checks
Before attaching probes in production:
- kernel/BPF feature compatibility validated (
bpftrace --info), - privileges/capabilities validated,
- target process identity and binary path pinned,
- canary host selected.
For modern eBPF auto-instrumentation stacks (for example OTel OBI), baseline requirements commonly include Linux 5.8+ (or distro backports), root/capability model, and architecture compatibility (x86_64/arm64).
5) Safe attach workflow (operator runbook)
Step 1 — Enumerate probes first
# list USDT probes in a binary
bpftrace -l "usdt:/path/to/bin:*"
# list probes for a running PID (includes shared-lib USDT)
bpftrace -lp <PID> "usdt:*"
For SDT-enabled binaries, perf list, perf probe, and BCC tplist are useful cross-checks.
Step 2 — Start with count-only probes
Avoid printf storms first. Validate event rates cheaply.
bpftrace -e 'usdt:/path/to/bin:main:run_start { @hits = count(); }'
Step 3 — Scope to one process
Use PID scoping aggressively during incident triage:
bpftrace -p <PID> -e 'uprobe:/lib/x86_64-linux-gnu/libc.so.6:malloc { @m = count(); }'
-p semantics matter: for USDT/uprobes/uretprobes, attachment is process-specific; this is a major safety lever.
Step 4 — Handle short-lived process races
For child-process tracing, prefer -c path when possible:
bpftrace -c './target-binary --args' -e 'usdt:./target-binary:* { @x = count(); }'
bpftrace pauses child post-execve and resumes after attaching USDT probes, reducing startup race loss.
Step 5 — Promote only after canary validation
Promotion gates:
- event drop rate acceptable,
- CPU overhead within budget,
- no meaningful p99/p999 regression,
- clear rollback command and owner.
6) Overhead budgeting (the part people skip)
Disabled USDT probes are designed for minimal overhead (commonly close to zero in practice). But once enabled, cost scales with:
- probe hit frequency,
- payload extraction complexity,
- stack collection/unwinding,
- user-space printing/export volume.
Practical order of operations:
- count,
- histogram/sampled aggregation,
- selective argument capture,
- stack traces only when required.
Avoid raw event streaming as default; aggregate in-kernel whenever possible.
7) Failure modes and mitigations
A) Symbol drift / stripped binaries
- Pin exact build IDs.
- Validate symbols on deploy candidate, not just local dev.
- Keep a fallback offset/symbol map per release.
B) ABI drift in function arguments
- Treat probe argument decoding as versioned contract.
- Add smoke tests that fail if decoding assumptions break.
C) Lost events under burst
- Favor ring-buffer/perf-buffer sizing and aggregate maps.
- Track lost-event counters and fail observability silently-never.
D) Probe blast radius too broad
- Default to PID-scoped attachments.
- Add probe filters and duration TTLs.
- Encode strict “auto-detach” conditions in scripts.
E) Permission surprises in hardened hosts
- Pre-negotiate capability profile (e.g., CAP_BPF/CAP_PERFMON/CAP_SYS_PTRACE where required).
- Keep a documented “observability privilege profile” per environment.
8) How this fits with OpenTelemetry eBPF auto-instrumentation
OTel eBPF instrumentation (OBI) gives broad zero-code visibility (HTTP/S, gRPC, RED metrics, multiple languages) and is excellent for fast baseline coverage.
But OBI has an intentional limit: it is generic by design. For domain-specific internals (matching engine state, custom queues, proprietary protocol phases), pair OBI with targeted USDT/uprobes.
Best pattern:
- OBI for broad service map and baseline spans,
- USDT/uprobes for precision diagnostics and app-specific semantics.
9) 30-day adoption plan
Week 1:
- inventory top 5 latency-critical services,
- define “must-answer” incident questions per service.
Week 2:
- add 3–5 high-value USDT probes in one service (state transitions, queue/loop boundaries),
- create two bpftrace canary scripts (count-only + latency histogram).
Week 3:
- build one libbpf/BCC reusable script with PID scoping + TTL + drop counters,
- wire dashboards for probe hit rate, drop rate, and host overhead.
Week 4:
- run a game day with tracing attach/detach,
- publish operational guardrails (who can attach, where, for how long, rollback rules).
10) One-line takeaway
USDT and uprobes are most valuable when treated as a governed production interface: narrow scope, explicit overhead budgets, and repeatable attach/rollback discipline.
References
- bpftrace documentation (options, probe behavior, PID/child semantics): https://bpftrace.org/docs/pre-release
- bpftrace man page (process-scoped attach behavior, safety options): https://man.archlinux.org/man/extra/bpftrace/bpftrace.8.en
- Brendan Gregg — Linux eBPF Tracing Tools overview: https://www.brendangregg.com/ebpf.html
- libbpf-bootstrap examples (uprobe/usdt, CO-RE scaffolding): https://github.com/libbpf/libbpf-bootstrap
- Open vSwitch USDT probes guide (compile/list/use patterns): https://docs.openvswitch.org/en/latest/topics/usdt-probes/
- OpenTelemetry eBPF Instrumentation (OBI) docs: https://opentelemetry.io/docs/zero-code/obi/
- OpenTelemetry blog (automatic instrumentation techniques incl. eBPF/uprobes): https://opentelemetry.io/blog/2025/demystifying-auto-instrumentation/