Linux Policy Routing Playbook (ip rule + fwmark for Deterministic Egress)

2026-03-22 · systems

Linux Policy Routing Playbook (ip rule + fwmark for Deterministic Egress)

Date: 2026-03-22
Category: knowledge
Domain: systems / linux networking / low-latency operations

Why this matters

If you run multi-homed servers (or multiple uplinks/VLANs), destination-only routing (main table only) is often not enough.

Typical pain:

Policy routing lets you make egress decisions from intent signals (source subnet, interface, fwmark, uid range), not only destination prefix.


1) Core mental model

Linux routing has two layers:

  1. RPDB (Routing Policy Database) via ip rule
    • Rules are checked by priority (smaller number = higher priority).
  2. Routing tables via ip route
    • A matched rule tells Linux which table to consult.

Default behavior is roughly:

Your job is to insert policy rules before the generic main lookup where needed.


2) When to use policy routing

Use it when at least one is true:

If all traffic can share one default route and one operational policy, keep it simple and avoid PBR complexity.


3) Practical design patterns

Pattern A — Source-subnet based egress

Use when each service binds to a dedicated source IP/subnet.

Best for static segmentation (e.g., market-data subnet vs execution subnet).

Pattern B — fwmark-based egress classes

Use when policy depends on app intent, not just source IP.

Best for dynamic classes (critical/live, backfill, bulk sync, telemetry).

Pattern C — uidrange-based service steering

Use when each service runs under a dedicated Unix user.

Best for minimizing packet-marking complexity in simple hosts.


4) Minimal production blueprint

  1. Define route tables in /etc/iproute2/rt_tables

    • Example names: rt_exec, rt_md, rt_mgmt.
  2. Populate each table with:

    • required connected routes,
    • explicit default route (or explicit non-default policy if intentionally isolated).
  3. Create RPDB rules with explicit priorities

    • Keep a visible gap strategy (e.g., 1000, 1100, 1200…) for maintainability.
  4. Add fallback semantics deliberately

    • If no policy rule matches, traffic should fall to main by design—not by accident.
  5. Persist config through your network manager

    • systemd-networkd / NetworkManager / netplan / distro scripts.
    • Avoid “works until reboot.”

5) Non-negotiable guardrails

  1. Rule priority hygiene

    • Overlapping selectors without explicit priority intent cause shadowing bugs.
    • Always document why each priority exists.
  2. Table completeness checks

    • Missing default routes in custom tables can blackhole marked traffic.
  3. Asymmetric path awareness

    • Reverse-path filtering can drop valid packets in asymmetric designs.
    • Validate rp_filter posture per interface for your threat model.
  4. Conntrack/mark consistency

    • If you rely on marks, ensure mark lifecycle is consistent across request/reply and NAT boundaries.
  5. Atomic rollout

    • Stage table routes first, then rules, then mark producers.
    • Reverse order on rollback.

6) Validation runbook (must pass before cutover)

A. Structural checks

Confirm:

B. Path simulation checks

Use route queries that include policy context (source/mark/interface) to verify expected nexthop resolution before real traffic switch.

C. Live traffic checks

D. Failure-injection checks


7) Common failure modes

Failure mode 1: Rule shadowing by broad early match

Symptom:

Fix:

Failure mode 2: Marked traffic blackholes

Symptom:

Fix:

Failure mode 3: Intermittent one-way connectivity

Symptom:

Fix:

Failure mode 4: Reboot regression

Symptom:

Fix:


8) Operational metrics worth tracking

If you cannot observe policy-hit and class latency together, policy routing incidents will remain opaque.


9) Change-management checklist

Before change:

During change:

After change:


10) Recommended default policy

For most multi-homed low-latency hosts:

Deterministic egress is not a “network nice-to-have.” It is often the difference between stable execution and random latency incidents.


References