RPKI-First BGP Routing Hygiene Playbook (ROA/ROV/RTR + Route-Leak Guardrails)
Date: 2026-03-13
Category: knowledge
Scope: Practical deployment pattern for reducing origin hijacks without causing self-inflicted routing outages.
1) Why this matters
Classic BGP trust is too permissive: a wrong origin announcement can propagate globally.
RPKI-based origin validation (ROV) gives operators a cryptographic way to check whether the origin AS is authorized for a prefix. Done well, this cuts a major class of hijacks. Done carelessly, it can create self-inflicted outages via bad ROAs.
This playbook is about deploying safely and operationally, not just enabling a checkbox.
2) What RPKI/ROV actually guarantees (and what it does not)
What it gives you
- A signed authorization model (ROAs) mapping prefix resources to allowed origin ASNs.
- Validation states for BGP routes: Valid / NotFound / Invalid.
- A machine-consumable feed from validators/caches to routers via RTR (RPKI-to-Router protocol).
What it does not give you
- Full AS_PATH security by itself.
- Automatic route-leak prevention (customer/peer/provider propagation mistakes).
- Immunity from operator errors (bad
maxLength, stale/missing ROAs).
Mental model: ROV is a high-value baseline control, not the whole routing-security stack.
3) Validation-state semantics you should internalize
Per origin validation logic (RFC 6811 family):
- Valid: origin ASN is authorized for this prefix length.
- Invalid: a covering authorization exists but does not match (wrong ASN and/or prefix length exceeds allowed
maxLength). - NotFound: no covering authorization data exists.
Operationally, this means:
Invalidis often strong evidence of bad origin data.NotFoundis still common in partial-deployment reality and should not be treated as equivalent toInvalid.
4) ROA authoring rules that prevent most self-outages
4.1 Prefer minimal ROAs
Authorizations should match actually-originated prefixes as tightly as possible.
4.2 Be conservative with maxLength
maxLength is operationally useful, but over-broad values expand forged-origin subprefix attack surface and increase error blast radius.
4.3 Cover all legitimate origin ASNs
If a prefix can be originated by multiple ASNs (multi-homing, migrations, mitigation providers), issue ROAs for all legitimate origin cases.
4.4 Supernet/subnet sequencing discipline
Before publishing a supernet ROA, verify sub-allocations announced by other ASNs are correctly represented; otherwise you can accidentally make legitimate downstream announcements Invalid.
5) Safe policy rollout ladder (do this in stages)
Stage 0 β Instrument only
- Compute and store validation state for all eligible routes.
- No enforcement yet.
Stage 1 β Preferential routing
- Prefer
ValidoverNotFound. - Strongly de-prefer
Invalid.
Stage 2 β Controlled rejection
- Reject
Invalidat selected edges/peers first. - Keep rollback path and exception list.
Stage 3 β Broad invalid rejection
- Enforce at all relevant ingress points after stability evidence.
- Continue monitoring for self-invalid spikes.
This mirrors BCP guidance: keep reachability safe during partial deployment, but move toward dropping Invalid once confidence is established.
6) Router/validator architecture that survives bad days
6.1 Multiple caches, not one
Routers should peer with more than one trusted cache/validator to avoid single-point failure.
6.2 Place caches close to control plane
Reduce bootstrap and reachability dependencies; avoid circular dependencies where routing must converge before the router can reach validation data.
6.3 Protect RTR transport
Routers trust cache output; secure and harden that channel and avoid insecure inter-AS transport for router-cache sessions.
6.4 Keep caches fresh and observable
Track serial lag, cache freshness, and last successful update times. Stale-but-quiet is dangerous.
7) Two commonly missed implementation details
7.1 βSet state, donβt auto-actβ
Validation state should be computed broadly, but policy actions must be explicit operator choice. Implicit vendor defaults are an outage risk.
7.2 Validate egress with the effective origin AS
When exporting, policy/AS_PATH manipulations can change effective origin semantics. Egress validation should use the post-policy effective origin view.
8) ROV is not route-leak defense: add BGP roles/OTC
ROV primarily addresses origin legitimacy.
Route leaks (relationship-violating propagation) require complementary controls. BGP Roles and OTC signaling (RFC 9234) add in-band relationship-aware safeguards for leak prevention/detection.
Practical stack:
- RPKI/ROV for origin hygiene.
- BGP Roles/OTC for propagation-hygiene.
- Classic import/export filters and IRR/RPKI sanity checks.
9) Incident playbook for βWhy did this become Invalid?β
- Identify affected prefix + observed origin ASN.
- Compare current VRP/ROA set versus previous snapshot.
- Check
maxLengthmismatch first (very common). - Check AS migration/private-AS stripping/policy rewrites (effective-origin mismatch).
- Validate cache freshness and RTR session health.
- Apply temporary exception only with expiry and ticketed follow-up.
- Post-incident: fix ROA model, add pre-change validation tests.
10) Operator checklist (short form)
- Minimal ROAs by default; broad
maxLengthonly with explicit justification. - All legitimate origin ASNs covered (including mitigation/failover paths).
- At least two validator/cache paths per critical router set.
- RTR transport and trust boundaries hardened.
- Validation state computed for all relevant routes.
- Explicit policy for
Valid/NotFound/Invalid(no hidden defaults). - Egress validation uses effective origin AS semantics.
- Route-leak controls (Roles/OTC + filtering) deployed alongside ROV.
- Continuous monitoring for invalid spikes, cache lag, and ROA expiry.
References
RFC 6480 β An Infrastructure to Support Secure Internet Routing
https://www.rfc-editor.org/rfc/rfc6480RFC 6811 β BGP Prefix Origin Validation
https://www.rfc-editor.org/rfc/rfc6811RFC 7115 (BCP 185) β Origin Validation Operation Based on the RPKI
https://www.rfc-editor.org/rfc/rfc7115RFC 8210 β RPKI to Router Protocol, Version 1
https://www.rfc-editor.org/rfc/rfc8210RFC 8481 β Clarifications to BGP Origin Validation
https://www.rfc-editor.org/rfc/rfc8481RFC 8893 β RPKI Origin Validation for BGP Export
https://www.rfc-editor.org/rfc/rfc8893RFC 9234 β Route Leak Prevention and Detection Using Roles
https://www.rfc-editor.org/rfc/rfc9234RFC 9319 (BCP) β The Use of maxLength in the RPKI
https://www.rfc-editor.org/rfc/rfc9319
One-line takeaway
Treat RPKI as a production control system: precise ROAs + staged policy + resilient validator architecture + leak-specific controls beats checkbox deployment every time.