Linux MGLRU Playbook (Multi-Gen LRU for Memory-Reclaim Stability)

Date: 2026-03-18
Category: knowledge

Why this matters

Under memory pressure, many latency incidents are not CPU-bound compute issues. They are reclaim-path issues:

kswapd spikes,
direct reclaim stalls in request paths,
cache thrash that repeatedly evicts and re-faults hot pages,
tail-latency blowups while average metrics still look “fine”.

MGLRU (Multi-Gen LRU) improves reclaim decisions by aging pages across multiple generations instead of only active/inactive lists. In practice, this often means better cold-page targeting and less reclaim chaos during pressure.

1) Mental model: reclaim quality > reclaim volume

Traditional reclaim tuning often focuses on "how much" to reclaim. MGLRU shifts focus to "which pages" to reclaim first.

Goal:

preserve true working set longer,
evict genuinely cold pages earlier,
reduce reclaim-induced jitter and refault churn.

If reclaim quality improves, you usually get:

lower direct-reclaim incidence,
lower memory-pressure tail latency,
fewer user-visible stalls/janks under stress.

2) Core controls you should know

/sys/kernel/mm/lru_gen/enabled

Main runtime switch and feature bitmask.
y usually enables all supported components.
n disables.

/sys/kernel/mm/lru_gen/min_ttl_ms

Thrash-prevention guardrail.
Protects a recent working set window from eviction (time-based).
If memory cannot satisfy that protection, OOM kill can happen earlier (by design).

From kernel docs:

N=1000 ms is a common baseline for reducing visible jank on desktop-ish workloads.
Larger values can smooth UX further but increase premature-OOM risk.

3) Quick verification checklist

Check kernel support

grep -E 'CONFIG_LRU_GEN|CONFIG_LRU_GEN_ENABLED' /boot/config-$(uname -r)

Check runtime interface

ls /sys/kernel/mm/lru_gen
cat /sys/kernel/mm/lru_gen/enabled
cat /sys/kernel/mm/lru_gen/min_ttl_ms

Enable all supported components

echo y | sudo tee /sys/kernel/mm/lru_gen/enabled
cat /sys/kernel/mm/lru_gen/enabled

Expected common output after full enable: 0x0007 (platform-dependent).

4) Practical rollout profiles

Profile A: Server/API (conservative)

Enable MGLRU
Keep min_ttl_ms=0 initially
Measure reclaim/latency changes first

echo y | sudo tee /sys/kernel/mm/lru_gen/enabled
echo 0 | sudo tee /sys/kernel/mm/lru_gen/min_ttl_ms

Why: server workloads often prefer avoiding surprise OOM behavior before confidence is built.

Profile B: Interactive desktop/workstation

Enable MGLRU
Start with min_ttl_ms=1000

echo y | sudo tee /sys/kernel/mm/lru_gen/enabled
echo 1000 | sudo tee /sys/kernel/mm/lru_gen/min_ttl_ms

Why: prioritizes responsiveness under pressure.

5) Observability: what to track during canary

Pair reclaim metrics with user-facing latency.

Reclaim pressure / stalls

cat /proc/pressure/memory
vmstat 1

Fault/refault health (proxy)

grep -E 'pgfault|pgmajfault|pgscan|pgsteal' /proc/vmstat

kswapd + direct reclaim symptoms

kswapd CPU usage bursts
elevated allocstall / direct reclaim indicators
request latency outliers correlated with memory pressure windows

SLO view

p95/p99 latency during memory stress
throughput under constrained memory
OOM frequency / kill targets

6) Tuning guidance (safe order)

Enable MGLRU only (enabled=y, min_ttl_ms=0)
Compare canary vs baseline on pressure scenarios
If interactive stalls remain, test min_ttl_ms ladder:
- 300 → 1000 → 2000 (only if needed)
Stop increasing once:
- tail-latency benefit plateaus, or
- OOM risk rises meaningfully
Persist chosen values with distro-appropriate boot/service config

Rule of thumb:

lower min_ttl_ms = safer memory headroom,
higher min_ttl_ms = stronger thrash protection + higher OOM aggressiveness.

7) Common footguns

Turning on high min_ttl_ms globally without canary
- can trade jank for unexpected OOM kills.
Reading average latency only
- reclaim problems are mostly tail problems.
Ignoring memcg policy interactions
- memory.high / memory.max and reclaim tuning can dominate outcome.
No pressure replay in tests
- idle benchmarks won’t show reclaim-path improvements.
Assuming every kernel/distro default is identical
- support and default enablement vary.

8) Suggested pressure test recipe

Run the same workload with fixed memory limits in two configs:

Baseline (old/default reclaim behavior)
MGLRU enabled (and optionally one min_ttl_ms candidate)

Collect per run:

p50/p95/p99 latency,
major fault rate,
PSI memory pressure,
kswapd CPU,
OOM incidents.

Promote only if p99 improves without unacceptable OOM/throughput regressions.

Closing

MGLRU is not a magic "faster kernel" switch. It is a reclaim-quality upgrade that can materially reduce memory-pressure tail pain when rolled out with metrics discipline.

Treat it as a controlled change:

enable,
observe under stress,
tune min_ttl_ms carefully,
keep rollback easy.

That approach captures most of the upside without turning memory pressure into an OOM lottery.