BGP Fast-Failover Operations Playbook β PIC + FRR + Detection Budgets
Date: 2026-03-26
Category: knowledge
Audience: Backbone/edge network engineers running large BGP fabrics
1) Why this matters
In large BGP networks, failure pain is rarely about whether an alternate path exists. It is mostly about how long forwarding takes to move after failure.
Without pre-computed repair, convergence time can scale with route volume and control-plane churn. With the right design, failover can be driven by local forwarding updates that are mostly independent of prefix count.
That is the operational value of combining:
- fast failure detection (BFD/physical signals),
- IGP fast reroute (LFA/RLFA/TI-LFA),
- BGP PIC (hierarchical FIB + pre-installed backup recursion).
2) Failure timeline mental model (where milliseconds are spent)
Treat failover latency as a budget with four buckets:
- Detection β link down, BFD down, or adjacencies expiring.
- Local repair β immediate dataplane detour by the PLR (point of local repair).
- FIB indirection switch β next-hop group pointer swap (PIC behavior).
- Control-plane cleanup β protocol reconvergence and path re-selection afterward.
If you skip (2) and (3), all traffic waits for (4), and tail loss explodes.
3) The three building blocks (and what each does)
A) IP FRR / TI-LFA: immediate local protection
- LFA (RFC 5286): simple local alternate next-hop when topology allows.
- Remote LFA (RFC 7490): extends coverage when local LFA is unavailable.
- TI-LFA with SR (RFC 9855, 2025): broad/guaranteed-style coverage in two-connected topologies with precomputed SR repair paths.
Role: keep packets moving during the first moments after a failure.
B) BGP PIC: prefix-independent forwarding change
The current IETF work (draft-ietf-rtgwg-bgp-pic) describes organizing forwarding hierarchically so many prefixes share forwarding objects.
Operationally:
- many prefixes point to shared recursion objects / next-hop groups,
- on failure, a pointer swap can redirect large route sets together,
- failover time is less sensitive to number of prefixes.
Two practical domains:
- PIC Core: core link/node failure while BGP next-hop remains reachable via alternate IGP path.
- PIC Edge: edge/egress failure requiring fast move to alternate BGP next-hop/egress.
C) Fast detection (BFD, RFC 5880)
BFD gives protocol-independent liveness with low detection latency. But overly aggressive timers can create flap storms and false positives.
Role: trigger protection quickly, but with controlled stability margins.
4) Preconditions before enabling PIC (non-negotiable)
Path diversity exists
- PIC helps only when alternate ECMP/backup paths exist.
Recursive forwarding is already healthy
- clean next-hop resolution in RIB/FIB,
- no fragile recursion chains.
IGP underlay supports fast repair
- LFA/RLFA/TI-LFA policy and coverage are validated.
Edge route visibility is sufficient
- where needed, use mechanisms like ADD-PATH (RFC 7911) so alternate paths are visible before failure.
Hardware/software FIB scale is understood
- backup/repair objects consume resources; validate headroom first.
5) Practical rollout sequence (low-risk path)
Phase 0 β Baseline current failover
Measure before changes:
- failure-to-first-loss,
- loss-window duration,
- first-good-packet recovery,
- p50/p95/p99 convergence windows,
- update bursts and CPU peaks.
Phase 1 β Stabilize detection budget
- Start from conservative BFD intervals/multipliers.
- Tune down carefully while watching false-positive rate.
- Keep separate profiles for backbone trunks vs noisier/edge segments.
Phase 2 β Deploy IGP FRR first
- Enable LFA/RLFA/TI-LFA coverage.
- Validate link-fail and node-fail scenarios separately.
- Confirm repair path quality (latency/stretch) is acceptable.
Phase 3 β Enable PIC Core
- Focus on core path failures where egress is unchanged.
- Validate pointer-swap behavior and recovery consistency under load.
- Watch for microburst side effects during reroute.
Phase 4 β Enable PIC Edge
- Cover egress PE/edge failure behavior.
- Confirm alternate egress policies (communities, local-pref, next-hop reachability).
- Test single-failure and dual-stress cases (failure + high load).
Roll out in rings/cells; avoid one-shot network-wide activation.
6) Observability you need (or you are flying blind)
Dataplane
- Packet-loss window (ms)
- Reorder depth/ratio during switchover
- Jitter and queue growth on repair path
- Post-failover utilization hot spots
Control plane
- BFD session transitions and flap counts
- IGP SPF/repair invocation timing
- BGP update burst size and completion time
- RIB-to-FIB programming lag
Business/SLO layer
- Service p99 latency during failure drills
- Retransmit spike duration
- Error-budget burn per failover event
7) Common failure modes (seen in production)
βPIC enabledβ but no real alternates
- zero benefit because topology is single-path at failure point.
BFD too hot
- false downs trigger oscillation and control-plane storms.
FRR coverage gaps
- LFA unavailable in some topologies; unprotected prefixes still wait for reconvergence.
Edge policy mismatch
- alternate egress exists physically but is rejected by policy.
Scale surprises in FIB objects
- backup groups consume TCAM/adjacency resources; partial install causes inconsistent behavior.
Testing only link failure
- node/SRLG-style events expose very different behavior.
8) Quick incident triage checklist (during real failure)
- Is failure detected by physical signal, BFD, or protocol timeout?
- Did local repair activate (LFA/RLFA/TI-LFA counters/tables)?
- Did next-hop group/recursion object switch immediately?
- Are alternates policy-eligible and installed in FIB?
- Is loss from dataplane congestion after reroute (not control-plane delay)?
- Are BFD flaps continuing after first event (instability loop)?
This order prevents wasting time blaming BGP when the real issue is underlay repair or path capacity.
9) Bottom line
Fast failover at BGP scale is not one feature toggle. It is a layered control system:
Detection budget (BFD/LOS) + local repair (FRR/TI-LFA) + prefix-independent forwarding structure (PIC) + controlled reconvergence cleanup.
If any one layer is weak, failures return to prefix-by-prefix pain.
References
- Internet-Draft: BGP Prefix Independent Convergence (draft-ietf-rtgwg-bgp-pic)
https://datatracker.ietf.org/doc/draft-ietf-rtgwg-bgp-pic/ - RFC 5714 β IP Fast Reroute Framework
https://www.rfc-editor.org/rfc/rfc5714 - RFC 5286 β Basic Specification for IP Fast Reroute: Loop-Free Alternates
https://www.rfc-editor.org/rfc/rfc5286 - RFC 7490 β Remote Loop-Free Alternate (LFA) Fast Reroute (FRR)
https://www.rfc-editor.org/rfc/rfc7490 - RFC 9855 β Topology Independent Fast Reroute Using Segment Routing
https://www.rfc-editor.org/rfc/rfc9855 - RFC 5880 β Bidirectional Forwarding Detection (BFD)
https://www.rfc-editor.org/rfc/rfc5880 - RFC 7911 β Advertisement of Multiple Paths in BGP (ADD-PATH)
https://www.rfc-editor.org/rfc/rfc7911