Kubernetes API Priority and Fairness (APF) Overload Protection Playbook

2026-03-25 · software

Kubernetes API Priority and Fairness (APF) Overload Protection Playbook

Date: 2026-03-25
Category: knowledge
Scope: Practical APF design/tuning guide for protecting critical Kubernetes API traffic during overload.


1) Why APF matters (even if your cluster is “normally fine”)

During API server pressure, global inflight limits alone (--max-requests-inflight, --max-mutating-requests-inflight) can keep the control plane alive but still let noisy clients starve critical calls.

APF (API Priority and Fairness) adds:

This is the difference between “apiserver survives” and “cluster remains operable” under stress.


2) Mental model in 60 seconds

Request path with APF enabled:

  1. Request matches a FlowSchema.
  2. FlowSchema maps to one PriorityLevelConfiguration.
  3. Priority level dispatches immediately (if seats available) or queues/rejects by policy.

Key details worth remembering:


3) APF objects and knobs that actually matter

3.1 FlowSchema (classification)

A FlowSchema matches request attributes:

And then routes matching requests to a priority level.

Operationally important fields:

3.2 PriorityLevelConfiguration (handling)

Two types:

For Limited, these are the high-leverage controls:


4) Seat math you can use in production

Let ServerCL be total APF concurrency budget (derived from API server inflight limits).

For limited level i:

Interpretation:


5) Queue tuning: practical defaults and trade-offs

APF queue parameters (for limitResponse.type: Queue):

Tuning guidance:

Simple starting point for busy multi-tenant clusters:

Then tune from metrics, not intuition.


6) Safe rollout pattern

Phase A — Snapshot and baseline

kubectl get flowschemas,prioritylevelconfigurations -o yaml > apf-baseline.yaml
kubectl get --raw /metrics | grep apiserver_flowcontrol > apf-metrics-baseline.txt

Track before-change rates for:

Phase B — Create one explicit critical lane

Create a dedicated Limited level for one critical client group (e.g., deployment controller, GitOps reconciler, platform automation SA).

Do not edit many suggested defaults first. Add one lane and observe.

Phase C — Attach a single FlowSchema with clear precedence

Use a unique precedence (avoid ties), and verify only intended traffic matches.

Phase D — Observe under load

Success criteria:

Phase E — Iterate gradually

Adjust one knob at a time:


7) Example: protect a CI/CD deploy service account

apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: PriorityLevelConfiguration
metadata:
  name: cicd-critical
spec:
  type: Limited
  limited:
    nominalConcurrencyShares: 40
    lendablePercent: 0
    borrowingLimitPercent: 50
    limitResponse:
      type: Queue
      queuing:
        queues: 64
        handSize: 8
        queueLengthLimit: 60
---
apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: FlowSchema
metadata:
  name: cicd-critical
spec:
  matchingPrecedence: 500
  distinguisherMethod:
    type: ByUser
  priorityLevelConfiguration:
    name: cicd-critical
  rules:
    - subjects:
        - kind: ServiceAccount
          serviceAccount:
            namespace: platform
            name: deployer
      resourceRules:
        - verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
          apiGroups: ["*"]
          resources: ["*"]
          namespaces: ["*"]

Note:


8) High-value operational checks

  1. FlowSchema correctness

    • Ensure status.conditions does not show dangling reference behavior.
  2. Seat pressure by lane

    • Watch per-priority executing seats vs queue growth.
  3. 429 reason mix

    • queue-full vs concurrency-limit vs time-out tells you whether to tune queue depth, seat allocation, or request behavior.
  4. Request-shape hygiene

    • Large list storms are expensive; pagination/watch patterns often reduce seat pressure.
  5. Bootstrap object ownership

    • Suggested objects can be auto-updated by apiserver unless controlled via apf.kubernetes.io/autoupdate-spec annotation.

9) Common failure modes


10) Incident playbook (APF-related API outage)

When 429s spike or control-plane latency jumps:

  1. Identify top rejected flow_schema / priority_level from metrics.
  2. Separate “critical” vs “noisy” traffic.
  3. If critical is starved:
    • increase its NCS,
    • reduce lendability of critical lane (lendablePercent),
    • cap borrower (borrowingLimitPercent) on noisy lane.
  4. If queue timeout dominates:
    • either add seats (NCS or control-plane capacity),
    • or reduce expensive request patterns.
  5. Validate stabilization, then codify permanent APF policy.

11) Bottom line

APF is most effective when treated as control-plane QoS engineering:

If you only tweak queue length, you’ll postpone pain. If you design priority lanes + seat economics, your cluster keeps functioning during bad days.


References