BGP FlowSpec DDoS Mitigation Operations Playbook

2026-03-30 · systems

BGP FlowSpec DDoS Mitigation Operations Playbook

How to use FlowSpec as a precise mitigation tool without turning your control plane into a self-inflicted outage.

Why this matters

For volumetric attacks, teams usually start with destination blackholing (RTBH). It is simple and fast, but coarse.

FlowSpec gives you L3/L4 selective filtering (prefix, protocol, ports, flags, packet length, fragments, DSCP), so you can often keep legitimate traffic alive while suppressing attack flows.

But FlowSpec is dangerous when run without guardrails:

The goal is not just “can we push rules”, but can we push rules safely, repeatedly, under stress.


1) Mental model: RTBH first-aid, FlowSpec surgery

Use both, with clear intent:

Practical policy:

  1. Start with pre-approved coarse control if link is melting.
  2. Move to narrower FlowSpec rules once signature confidence rises.
  3. Remove coarse controls quickly after precision rules stabilize.

2) Baseline architecture (minimal safe design)

Use a 3-plane layout:

  1. Detection plane
    • telemetry + anomaly detector generates candidate signatures
  2. Policy/controller plane
    • normalizes rules, enforces validation/risk checks, rate-limits announcements
  3. Enforcement plane
    • routers/switches receiving FlowSpec and applying hardware/software filters

Critical design point: treat the controller as a policy compiler, not a dumb relay.


3) Validation and trust model (non-negotiable)

RFC 8955 requires FlowSpec feasibility checks against unicast reachability (destination component, origin relationship, more-specific checks) and revalidation when best unicast path changes.

Operationally:

RFC 9117 relaxes parts of validation for practical topologies (for example, centralized controllers inside the same local domain and route-server realities), but relaxation should be explicitly scoped and policy-guarded, not globally permissive.


4) Rule authoring guardrails (what prevents disasters)

A) Match specificity floor

Never allow first-push rules that are too broad.

Examples of safer defaults:

B) Action allowlist by trust tier

C) TTL + auto-expiry

Every rule gets a hard expiry (for example, 10–60 minutes unless renewed by fresh evidence).

If your detector dies, stale mitigation must not live forever.

D) Dry-run / shadow evaluation

Before activating, run candidate rules against sampled flow logs/pcaps to estimate:

Promote only if collateral estimate is within policy budget.


5) Capacity engineering: rule budgets are part of DDoS defense

RFC 8955 security considerations explicitly call out device/event capacity limits.

Operate with hard budgets:

If budget would be exceeded, degrade gracefully:

  1. collapse low-value specific rules into a coarser temporary control,
  2. prioritize top-impact signatures,
  3. shed non-critical actions first.

6) Vendor heterogeneity strategy

Real deployments are not semantically identical across vendors.

Observed ecosystem reality (see APNIC operational report):

So, controller must compile by device profile:

Do not assume “accepted by BGP” means “enforced as intended”.


7) Observability: metrics that actually matter

Minimum dashboards:

Key alerts:


8) Safe incident workflow (runbook)

  1. Classify attack mode (volumetric flood vs protocol-specific abuse).
  2. Pick initial control
    • RTBH for immediate containment if links saturate,
    • direct FlowSpec if signature confidence is already high.
  3. Compile candidate FlowSpec with policy checks.
  4. Shadow evaluate against recent telemetry.
  5. Canary announce to limited edge scope.
  6. Observe for 1–3 minutes (effectiveness + collateral).
  7. Progressive rollout to full scope.
  8. Auto-expire + review after attack decay.
  9. Post-incident cleanup: withdraw stale rules, archive metrics, update templates.

9) Failure lessons to institutionalize

A well-known historical outage showed that a bad FlowSpec-style filter can trigger network-wide router instability when distributed broadly.

Actionable lesson:


10) Compact operator checklist


References