RPKI-First BGP Routing Hygiene Playbook (ROA/ROV/RTR + Route-Leak Guardrails)

Date: 2026-03-13
Category: knowledge
Scope: Practical deployment pattern for reducing origin hijacks without causing self-inflicted routing outages.

1) Why this matters

Classic BGP trust is too permissive: a wrong origin announcement can propagate globally.

RPKI-based origin validation (ROV) gives operators a cryptographic way to check whether the origin AS is authorized for a prefix. Done well, this cuts a major class of hijacks. Done carelessly, it can create self-inflicted outages via bad ROAs.

This playbook is about deploying safely and operationally, not just enabling a checkbox.

2) What RPKI/ROV actually guarantees (and what it does not)

What it gives you

A signed authorization model (ROAs) mapping prefix resources to allowed origin ASNs.
Validation states for BGP routes: Valid / NotFound / Invalid.
A machine-consumable feed from validators/caches to routers via RTR (RPKI-to-Router protocol).

What it does not give you

Full AS_PATH security by itself.
Automatic route-leak prevention (customer/peer/provider propagation mistakes).
Immunity from operator errors (bad maxLength, stale/missing ROAs).

Mental model: ROV is a high-value baseline control, not the whole routing-security stack.

3) Validation-state semantics you should internalize

Per origin validation logic (RFC 6811 family):

Valid: origin ASN is authorized for this prefix length.
Invalid: a covering authorization exists but does not match (wrong ASN and/or prefix length exceeds allowed maxLength).
NotFound: no covering authorization data exists.

Operationally, this means:

Invalid is often strong evidence of bad origin data.
NotFound is still common in partial-deployment reality and should not be treated as equivalent to Invalid.

4) ROA authoring rules that prevent most self-outages

4.1 Prefer minimal ROAs

Authorizations should match actually-originated prefixes as tightly as possible.

4.2 Be conservative with `maxLength`

maxLength is operationally useful, but over-broad values expand forged-origin subprefix attack surface and increase error blast radius.

4.3 Cover all legitimate origin ASNs

If a prefix can be originated by multiple ASNs (multi-homing, migrations, mitigation providers), issue ROAs for all legitimate origin cases.

4.4 Supernet/subnet sequencing discipline

Before publishing a supernet ROA, verify sub-allocations announced by other ASNs are correctly represented; otherwise you can accidentally make legitimate downstream announcements Invalid.

5) Safe policy rollout ladder (do this in stages)

Stage 0 — Instrument only

Compute and store validation state for all eligible routes.
No enforcement yet.

Stage 1 — Preferential routing

Prefer Valid over NotFound.
Strongly de-prefer Invalid.

Stage 2 — Controlled rejection

Reject Invalid at selected edges/peers first.
Keep rollback path and exception list.

Stage 3 — Broad invalid rejection

Enforce at all relevant ingress points after stability evidence.
Continue monitoring for self-invalid spikes.

This mirrors BCP guidance: keep reachability safe during partial deployment, but move toward dropping Invalid once confidence is established.

6) Router/validator architecture that survives bad days

6.1 Multiple caches, not one

Routers should peer with more than one trusted cache/validator to avoid single-point failure.

6.2 Place caches close to control plane

Reduce bootstrap and reachability dependencies; avoid circular dependencies where routing must converge before the router can reach validation data.

6.3 Protect RTR transport

Routers trust cache output; secure and harden that channel and avoid insecure inter-AS transport for router-cache sessions.

6.4 Keep caches fresh and observable

Track serial lag, cache freshness, and last successful update times. Stale-but-quiet is dangerous.

7) Two commonly missed implementation details

7.1 “Set state, don’t auto-act”

Validation state should be computed broadly, but policy actions must be explicit operator choice. Implicit vendor defaults are an outage risk.

7.2 Validate egress with the effective origin AS

When exporting, policy/AS_PATH manipulations can change effective origin semantics. Egress validation should use the post-policy effective origin view.

8) ROV is not route-leak defense: add BGP roles/OTC

ROV primarily addresses origin legitimacy.

Route leaks (relationship-violating propagation) require complementary controls. BGP Roles and OTC signaling (RFC 9234) add in-band relationship-aware safeguards for leak prevention/detection.

Practical stack:

RPKI/ROV for origin hygiene.
BGP Roles/OTC for propagation-hygiene.
Classic import/export filters and IRR/RPKI sanity checks.

9) Incident playbook for “Why did this become Invalid?”

Identify affected prefix + observed origin ASN.
Compare current VRP/ROA set versus previous snapshot.
Check maxLength mismatch first (very common).
Check AS migration/private-AS stripping/policy rewrites (effective-origin mismatch).
Validate cache freshness and RTR session health.
Apply temporary exception only with expiry and ticketed follow-up.
Post-incident: fix ROA model, add pre-change validation tests.

10) Operator checklist (short form)

Minimal ROAs by default; broad maxLength only with explicit justification.
All legitimate origin ASNs covered (including mitigation/failover paths).
At least two validator/cache paths per critical router set.
RTR transport and trust boundaries hardened.
Validation state computed for all relevant routes.
Explicit policy for Valid/NotFound/Invalid (no hidden defaults).
Egress validation uses effective origin AS semantics.
Route-leak controls (Roles/OTC + filtering) deployed alongside ROV.
Continuous monitoring for invalid spikes, cache lag, and ROA expiry.

References

RFC 6480 — An Infrastructure to Support Secure Internet Routing
https://www.rfc-editor.org/rfc/rfc6480
RFC 6811 — BGP Prefix Origin Validation
https://www.rfc-editor.org/rfc/rfc6811
RFC 7115 (BCP 185) — Origin Validation Operation Based on the RPKI
https://www.rfc-editor.org/rfc/rfc7115
RFC 8210 — RPKI to Router Protocol, Version 1
https://www.rfc-editor.org/rfc/rfc8210
RFC 8481 — Clarifications to BGP Origin Validation
https://www.rfc-editor.org/rfc/rfc8481
RFC 8893 — RPKI Origin Validation for BGP Export
https://www.rfc-editor.org/rfc/rfc8893
RFC 9234 — Route Leak Prevention and Detection Using Roles
https://www.rfc-editor.org/rfc/rfc9234
RFC 9319 (BCP) — The Use of maxLength in the RPKI
https://www.rfc-editor.org/rfc/rfc9319

One-line takeaway

Treat RPKI as a production control system: precise ROAs + staged policy + resilient validator architecture + leak-specific controls beats checkbox deployment every time.

RPKI-First BGP Routing Hygiene Playbook (ROA/ROV/RTR + Route-Leak Guardrails)

RPKI-First BGP Routing Hygiene Playbook (ROA/ROV/RTR + Route-Leak Guardrails)

1) Why this matters

2) What RPKI/ROV actually guarantees (and what it does not)

What it gives you

What it does not give you

3) Validation-state semantics you should internalize

4) ROA authoring rules that prevent most self-outages

4.1 Prefer minimal ROAs

4.2 Be conservative with maxLength

4.3 Cover all legitimate origin ASNs

4.4 Supernet/subnet sequencing discipline

5) Safe policy rollout ladder (do this in stages)

Stage 0 — Instrument only

Stage 1 — Preferential routing

Stage 2 — Controlled rejection

Stage 3 — Broad invalid rejection

6) Router/validator architecture that survives bad days

6.1 Multiple caches, not one

6.2 Place caches close to control plane

6.3 Protect RTR transport

6.4 Keep caches fresh and observable

7) Two commonly missed implementation details

7.1 “Set state, don’t auto-act”

7.2 Validate egress with the effective origin AS

8) ROV is not route-leak defense: add BGP roles/OTC

9) Incident playbook for “Why did this become Invalid?”

10) Operator checklist (short form)

References

One-line takeaway

4.2 Be conservative with `maxLength`