Kubernetes API Priority and Fairness (APF) Overload Protection Playbook

Date: 2026-03-25
Category: knowledge
Scope: Practical APF design/tuning guide for protecting critical Kubernetes API traffic during overload.

1) Why APF matters (even if your cluster is “normally fine”)

During API server pressure, global inflight limits alone (--max-requests-inflight, --max-mutating-requests-inflight) can keep the control plane alive but still let noisy clients starve critical calls.

APF (API Priority and Fairness) adds:

classification (who/what gets which priority),
isolation (separate concurrency budgets),
fair queuing (one bad flow cannot monopolize a priority level).

This is the difference between “apiserver survives” and “cluster remains operable” under stress.

2) Mental model in 60 seconds

Request path with APF enabled:

Request matches a FlowSchema.
FlowSchema maps to one PriorityLevelConfiguration.
Priority level dispatches immediately (if seats available) or queues/rejects by policy.

Key details worth remembering:

First-match wins among FlowSchemas by lowest numeric matchingPrecedence.
A flow is effectively: FlowSchema + distinguisher (ByUser, ByNamespace, or none).
APF uses seats (concurrency units), not just raw request counts.
Some operations are “heavier” and may consume multiple seats (notably expensive lists).

3) APF objects and knobs that actually matter

3.1 FlowSchema (classification)

A FlowSchema matches request attributes:

subjects (User / Group / ServiceAccount)
resource rules (verbs, API groups, resources, namespaces/cluster scope)
non-resource URLs

And then routes matching requests to a priority level.

Operationally important fields:

matchingPrecedence (selection order)
distinguisherMethod.type (ByUser or ByNamespace)
priorityLevelConfiguration.name (target level)

3.2 PriorityLevelConfiguration (handling)

Two types:

Exempt: not limited/queued like normal levels (guardrail/special-case use)
Limited: normal APF-controlled level

For Limited, these are the high-leverage controls:

nominalConcurrencyShares (NCS)
lendablePercent
borrowingLimitPercent
limitResponse.type (Queue or Reject)
queue tuning (queues, handSize, queueLengthLimit)

4) Seat math you can use in production

Let ServerCL be total APF concurrency budget (derived from API server inflight limits).

For limited level i:

NominalCL(i) = ceil(ServerCL * NCS(i) / sum(NCS))
LendableCL(i) = round(NominalCL(i) * lendablePercent(i) / 100)
BorrowingCL(i) = round(NominalCL(i) * borrowingLimitPercent(i) / 100) (if configured)

Interpretation:

Increase NCS to reserve more baseline seats.
Increase lendablePercent to let others borrow from this level when idle.
Cap borrowingLimitPercent if a level is over-borrowing and starving neighbors.

5) Queue tuning: practical defaults and trade-offs

APF queue parameters (for limitResponse.type: Queue):

queues (default 64)
handSize (default 8)
queueLengthLimit (default 50)

Tuning guidance:

More queues → fewer flow collisions, higher memory cost.
Larger handSize → better anti-collision for individual flows, but can let a few heavy flows dominate more queues; can also increase tail latency.
Larger queueLengthLimit → absorbs bursts, but increases waiting latency and memory.

Simple starting point for busy multi-tenant clusters:

queues: 64
handSize: 6~8
queueLengthLimit: 50~100

Then tune from metrics, not intuition.

6) Safe rollout pattern

Phase A — Snapshot and baseline

kubectl get flowschemas,prioritylevelconfigurations -o yaml > apf-baseline.yaml
kubectl get --raw /metrics | grep apiserver_flowcontrol > apf-metrics-baseline.txt

Track before-change rates for:

apiserver_flowcontrol_rejected_requests_total
apiserver_flowcontrol_current_inqueue_requests
apiserver_flowcontrol_request_wait_duration_seconds
apiserver_flowcontrol_current_executing_seats

Phase B — Create one explicit critical lane

Create a dedicated Limited level for one critical client group (e.g., deployment controller, GitOps reconciler, platform automation SA).

Do not edit many suggested defaults first. Add one lane and observe.

Phase C — Attach a single FlowSchema with clear precedence

Use a unique precedence (avoid ties), and verify only intended traffic matches.

Phase D — Observe under load

Success criteria:

Critical lane has low queue wait and near-zero rejects.
No catastrophic rise in reject/wait for neighbor lanes.
Cluster-level controller health remains stable during synthetic or real bursts.

Phase E — Iterate gradually

Adjust one knob at a time:

first nominalConcurrencyShares,
then borrowing/lending,
then queue parameters.

7) Example: protect a CI/CD deploy service account

apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: PriorityLevelConfiguration
metadata:
  name: cicd-critical
spec:
  type: Limited
  limited:
    nominalConcurrencyShares: 40
    lendablePercent: 0
    borrowingLimitPercent: 50
    limitResponse:
      type: Queue
      queuing:
        queues: 64
        handSize: 8
        queueLengthLimit: 60
---
apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: FlowSchema
metadata:
  name: cicd-critical
spec:
  matchingPrecedence: 500
  distinguisherMethod:
    type: ByUser
  priorityLevelConfiguration:
    name: cicd-critical
  rules:
    - subjects:
        - kind: ServiceAccount
          serviceAccount:
            namespace: platform
            name: deployer
      resourceRules:
        - verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
          apiGroups: ["*"]
          resources: ["*"]
          namespaces: ["*"]

Note:

Start narrow if possible (specific resources/verbs) before broad wildcards.
Keep matchingPrecedence explicitly separated from existing schemas.

8) High-value operational checks

FlowSchema correctness
- Ensure status.conditions does not show dangling reference behavior.
Seat pressure by lane
- Watch per-priority executing seats vs queue growth.
429 reason mix
- queue-full vs concurrency-limit vs time-out tells you whether to tune queue depth, seat allocation, or request behavior.
Request-shape hygiene
- Large list storms are expensive; pagination/watch patterns often reduce seat pressure.
Bootstrap object ownership
- Suggested objects can be auto-updated by apiserver unless controlled via apf.kubernetes.io/autoupdate-spec annotation.

9) Common failure modes

Precedence collisions: multiple schemas matching same traffic unintentionally.
One giant catch-all: all traffic in one level defeats APF isolation.
Over-borrowing: one high-load lane consumes neighbors’ headroom.
Queue-only thinking: increasing queue length hides the symptom while latency explodes.
Ignoring recursive server patterns: admission webhooks / aggregated APIs can create priority inversion or deadlock risks if classified poorly.

10) Incident playbook (APF-related API outage)

When 429s spike or control-plane latency jumps:

Identify top rejected flow_schema / priority_level from metrics.
Separate “critical” vs “noisy” traffic.
If critical is starved:
- increase its NCS,
- reduce lendability of critical lane (lendablePercent),
- cap borrower (borrowingLimitPercent) on noisy lane.
If queue timeout dominates:
- either add seats (NCS or control-plane capacity),
- or reduce expensive request patterns.
Validate stabilization, then codify permanent APF policy.

11) Bottom line

APF is most effective when treated as control-plane QoS engineering:

classify traffic intentionally,
reserve seats for critical workflows,
constrain noisy clients,
and tune from metrics under real pressure.

If you only tweak queue length, you’ll postpone pain. If you design priority lanes + seat economics, your cluster keeps functioning during bad days.

References

Kubernetes docs — API Priority and Fairness: https://kubernetes.io/docs/concepts/cluster-administration/flow-control/
kube-apiserver options (feature flag context): https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
Kubernetes Flow Control API types (v1): https://github.com/kubernetes/api/blob/master/flowcontrol/v1/types.go
KEP-1040 (design background): https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness