Kubernetes API Priority and Fairness (APF) Overload Protection Playbook
Date: 2026-03-25
Category: knowledge
Scope: Practical APF design/tuning guide for protecting critical Kubernetes API traffic during overload.
1) Why APF matters (even if your cluster is “normally fine”)
During API server pressure, global inflight limits alone (--max-requests-inflight, --max-mutating-requests-inflight) can keep the control plane alive but still let noisy clients starve critical calls.
APF (API Priority and Fairness) adds:
- classification (who/what gets which priority),
- isolation (separate concurrency budgets),
- fair queuing (one bad flow cannot monopolize a priority level).
This is the difference between “apiserver survives” and “cluster remains operable” under stress.
2) Mental model in 60 seconds
Request path with APF enabled:
- Request matches a FlowSchema.
- FlowSchema maps to one PriorityLevelConfiguration.
- Priority level dispatches immediately (if seats available) or queues/rejects by policy.
Key details worth remembering:
- First-match wins among FlowSchemas by lowest numeric
matchingPrecedence. - A flow is effectively:
FlowSchema + distinguisher(ByUser, ByNamespace, or none). - APF uses seats (concurrency units), not just raw request counts.
- Some operations are “heavier” and may consume multiple seats (notably expensive lists).
3) APF objects and knobs that actually matter
3.1 FlowSchema (classification)
A FlowSchema matches request attributes:
- subjects (User / Group / ServiceAccount)
- resource rules (verbs, API groups, resources, namespaces/cluster scope)
- non-resource URLs
And then routes matching requests to a priority level.
Operationally important fields:
matchingPrecedence(selection order)distinguisherMethod.type(ByUserorByNamespace)priorityLevelConfiguration.name(target level)
3.2 PriorityLevelConfiguration (handling)
Two types:
Exempt: not limited/queued like normal levels (guardrail/special-case use)Limited: normal APF-controlled level
For Limited, these are the high-leverage controls:
nominalConcurrencyShares(NCS)lendablePercentborrowingLimitPercentlimitResponse.type(QueueorReject)- queue tuning (
queues,handSize,queueLengthLimit)
4) Seat math you can use in production
Let ServerCL be total APF concurrency budget (derived from API server inflight limits).
For limited level i:
NominalCL(i) = ceil(ServerCL * NCS(i) / sum(NCS))LendableCL(i) = round(NominalCL(i) * lendablePercent(i) / 100)BorrowingCL(i) = round(NominalCL(i) * borrowingLimitPercent(i) / 100)(if configured)
Interpretation:
- Increase NCS to reserve more baseline seats.
- Increase lendablePercent to let others borrow from this level when idle.
- Cap borrowingLimitPercent if a level is over-borrowing and starving neighbors.
5) Queue tuning: practical defaults and trade-offs
APF queue parameters (for limitResponse.type: Queue):
queues(default 64)handSize(default 8)queueLengthLimit(default 50)
Tuning guidance:
- More queues → fewer flow collisions, higher memory cost.
- Larger handSize → better anti-collision for individual flows, but can let a few heavy flows dominate more queues; can also increase tail latency.
- Larger queueLengthLimit → absorbs bursts, but increases waiting latency and memory.
Simple starting point for busy multi-tenant clusters:
queues: 64handSize: 6~8queueLengthLimit: 50~100
Then tune from metrics, not intuition.
6) Safe rollout pattern
Phase A — Snapshot and baseline
kubectl get flowschemas,prioritylevelconfigurations -o yaml > apf-baseline.yaml
kubectl get --raw /metrics | grep apiserver_flowcontrol > apf-metrics-baseline.txt
Track before-change rates for:
apiserver_flowcontrol_rejected_requests_totalapiserver_flowcontrol_current_inqueue_requestsapiserver_flowcontrol_request_wait_duration_secondsapiserver_flowcontrol_current_executing_seats
Phase B — Create one explicit critical lane
Create a dedicated Limited level for one critical client group (e.g., deployment controller, GitOps reconciler, platform automation SA).
Do not edit many suggested defaults first. Add one lane and observe.
Phase C — Attach a single FlowSchema with clear precedence
Use a unique precedence (avoid ties), and verify only intended traffic matches.
Phase D — Observe under load
Success criteria:
- Critical lane has low queue wait and near-zero rejects.
- No catastrophic rise in reject/wait for neighbor lanes.
- Cluster-level controller health remains stable during synthetic or real bursts.
Phase E — Iterate gradually
Adjust one knob at a time:
- first
nominalConcurrencyShares, - then borrowing/lending,
- then queue parameters.
7) Example: protect a CI/CD deploy service account
apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: PriorityLevelConfiguration
metadata:
name: cicd-critical
spec:
type: Limited
limited:
nominalConcurrencyShares: 40
lendablePercent: 0
borrowingLimitPercent: 50
limitResponse:
type: Queue
queuing:
queues: 64
handSize: 8
queueLengthLimit: 60
---
apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: FlowSchema
metadata:
name: cicd-critical
spec:
matchingPrecedence: 500
distinguisherMethod:
type: ByUser
priorityLevelConfiguration:
name: cicd-critical
rules:
- subjects:
- kind: ServiceAccount
serviceAccount:
namespace: platform
name: deployer
resourceRules:
- verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
apiGroups: ["*"]
resources: ["*"]
namespaces: ["*"]
Note:
- Start narrow if possible (specific resources/verbs) before broad wildcards.
- Keep
matchingPrecedenceexplicitly separated from existing schemas.
8) High-value operational checks
FlowSchema correctness
- Ensure
status.conditionsdoes not show dangling reference behavior.
- Ensure
Seat pressure by lane
- Watch per-priority executing seats vs queue growth.
429 reason mix
queue-fullvsconcurrency-limitvstime-outtells you whether to tune queue depth, seat allocation, or request behavior.
Request-shape hygiene
- Large list storms are expensive; pagination/watch patterns often reduce seat pressure.
Bootstrap object ownership
- Suggested objects can be auto-updated by apiserver unless controlled via
apf.kubernetes.io/autoupdate-specannotation.
- Suggested objects can be auto-updated by apiserver unless controlled via
9) Common failure modes
- Precedence collisions: multiple schemas matching same traffic unintentionally.
- One giant catch-all: all traffic in one level defeats APF isolation.
- Over-borrowing: one high-load lane consumes neighbors’ headroom.
- Queue-only thinking: increasing queue length hides the symptom while latency explodes.
- Ignoring recursive server patterns: admission webhooks / aggregated APIs can create priority inversion or deadlock risks if classified poorly.
10) Incident playbook (APF-related API outage)
When 429s spike or control-plane latency jumps:
- Identify top rejected
flow_schema/priority_levelfrom metrics. - Separate “critical” vs “noisy” traffic.
- If critical is starved:
- increase its NCS,
- reduce lendability of critical lane (
lendablePercent), - cap borrower (
borrowingLimitPercent) on noisy lane.
- If queue timeout dominates:
- either add seats (NCS or control-plane capacity),
- or reduce expensive request patterns.
- Validate stabilization, then codify permanent APF policy.
11) Bottom line
APF is most effective when treated as control-plane QoS engineering:
- classify traffic intentionally,
- reserve seats for critical workflows,
- constrain noisy clients,
- and tune from metrics under real pressure.
If you only tweak queue length, you’ll postpone pain. If you design priority lanes + seat economics, your cluster keeps functioning during bad days.
References
- Kubernetes docs — API Priority and Fairness: https://kubernetes.io/docs/concepts/cluster-administration/flow-control/
- kube-apiserver options (feature flag context): https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
- Kubernetes Flow Control API types (v1): https://github.com/kubernetes/api/blob/master/flowcontrol/v1/types.go
- KEP-1040 (design background): https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness