Linux DAMON Proactive Reclamation and Access-Aware Memory Operations Playbook
Date: 2026-04-10
Category: knowledge
Domain: software / linux / memory management
Why this matters
Linux memory tuning often swings between two extremes:
- too reactive: you only notice trouble once reclaim, swap, or OOM behavior is already hurting tail latency;
- too blunt: you apply global knobs that know almost nothing about which memory is actually hot or cold.
DAMON matters because it gives Linux a middle path:
- low-overhead data access monitoring,
- region-based estimates of hotness and age,
- and policy hooks for access-aware actions.
The useful mental model is:
DAMON is Linux’s “watch memory access patterns first, act second” subsystem.
That makes it interesting for:
- memory-dense hosts with bursty pressure,
- caches whose cold pages linger too long,
- systems where page reclaim starts too late and gets too violent,
- NUMA or tiered-memory experiments,
- and operators who want something more informed than generic background reclaim.
It is not a magic replacement for the VM subsystem. It is a way to inject better recency/frequency information into memory decisions.
1) Quick mental model
DAMON stands for Data Access MONitoring and Access-aware System Operations.
At a high level, it does four things:
- Observes a target address space.
- Approximates access behavior by tracking regions instead of every page all the time.
- Maintains age + access-frequency signals for those regions.
- Feeds that information into actions such as page reclamation, LRU reprioritization, or other DAMOS schemes.
The key design idea is that full page-by-page observation is too expensive at scale. So DAMON uses:
- sampling intervals,
- aggregation intervals,
- bounded region counts,
- and adaptive split/merge of regions
so overhead stays controlled while the picture remains useful.
That trade-off is the whole point.
2) What DAMON is good at
Strong fits
A) Proactive reclaim before the cliff
If a machine alternates between calm periods and sudden pressure spikes, classic reclaim can wait too long and then become noisy.
DAMON is good when you want to identify memory that has been cold for a meaningful time window and reclaim it gently before the cliff.
B) Making LRU behavior more reality-based
LRU is a useful approximation, but real workloads are messy. Pages that look inactive from one signal may still matter; pages that linger on active lists may be dead weight.
DAMON can improve this by identifying hot/cold regions and using that information to promote or demote pages on LRU lists more intelligently.
C) Observation-led memory tuning
Before changing reclaim strategy, swap behavior, cache sizes, or tiering policy, DAMON can help answer:
- what is actually hot?
- what stays cold for minutes rather than milliseconds?
- how large is the real working set?
- how stable is that working set over time?
D) Research and platform engineering
DAMON is attractive for platform teams because it is both:
- an operator tool, and
- a kernel subsystem for building more specialized access-aware policies.
Weak fits
A) Instant, per-page, zero-error truth
DAMON is approximate by design. If you need exact accounting of every access, this is the wrong tool.
B) Tiny systems with simple pressure profiles
If your box is small and its behavior is obvious, the complexity may not buy much. Traditional VM tuning may be enough.
C) Severe pressure emergencies
DAMON-based reclaim is explicitly not meant to replace regular page-granularity reclaim under real pressure. It is for selective, lightweight, proactive work.
D) One-knob miracle tuning
DAMON gives you better information and better policy hooks. It does not spare you from threshold tuning, validation, or workload-specific judgment.
3) The architecture that actually matters to operators
The official docs describe three layers:
- Operations set layer
- Core logic
- Modules
The operator version is simpler:
A) Operations sets: what address space are you watching?
DAMON supports multiple monitoring targets, including:
- vaddr: virtual address spaces of specific processes
- fvaddr: fixed virtual address ranges
- paddr: system physical memory
If you care about whole-host proactive reclaim behavior, paddr is usually the most relevant mental model. If you care about understanding one workload’s memory pattern, vaddr is often the starting point.
B) kdamond: who does the work?
Each DAMON context is executed by a kernel thread named kdamond. Think of it as the monitoring/control worker for one configured monitoring context.
C) DAMOS: what do you do with the signal?
DAMON-based Operation Schemes (DAMOS) let you define:
- target access pattern,
- quotas,
- watermarks,
- filters,
- and an action.
This is where DAMON turns from “interesting observability” into “memory policy.”
4) How DAMON keeps overhead under control
This is the most important design point.
DAMON does not monitor every page equally forever. It uses region-based sampling.
Core knobs
- sampling interval: how often access is sampled
- aggregation interval: how long samples are accumulated into one snapshot
- update interval: how often dynamic target-space changes are reflected
- minimum / maximum number of regions: the bounds on monitoring granularity and overhead
Why regions matter
Adjacent pages are grouped into regions that are assumed to have similar behavior. DAMON then checks representative pages and adaptively splits or merges regions based on observed access differences.
That means:
- more regions -> potentially better fidelity, more overhead
- fewer regions -> cheaper monitoring, more approximation
The clever bit is that DAMON tries to ride that trade-off automatically within the bounds you set.
Age is as important as frequency
Operators often focus only on “hotness,” but DAMON also tracks how long the current access pattern has persisted. That is valuable because:
- short-term quiet pages are different from truly abandoned pages,
- and low-frequency-but-stable memory can matter differently from newly cold memory.
In practice, cold age thresholds are usually one of the most important tuning levers.
5) The best way to think about monitoring output
A DAMON snapshot is basically telling you:
- where in the address space a region lives,
- how big it is,
- how frequently it has been accessed during the aggregation window,
- and how long that pattern has remained stable.
That is enough to estimate:
- working-set shape,
- hot/cold separation quality,
- recency distribution,
- and whether your thresholds are absurd or sane.
A good practical reading of DAMON data is not:
“this specific page is definitely cold.”
It is:
“this region has been consistently cold enough, long enough, that reclaiming or deprioritizing it is a defensible bet.”
That distinction matters.
6) DAMON_RECLAIM: where proactive reclaim fits
DAMON_RECLAIM is the operator-facing static kernel module for proactive reclamation.
The docs are very explicit about its role:
- it is meant for proactive and lightweight reclamation,
- especially under light memory pressure,
- and it is not trying to replace normal LRU-list-based reclaim when pressure gets serious.
That is exactly how to use it.
When it shines
DAMON_RECLAIM is attractive when:
- free memory is drifting down but not yet catastrophic,
- cold cache or anonymous memory is accumulating,
- reclaim storms hurt latency when they happen all at once,
- and you want a small continuous tax instead of occasional violent stalls.
Why the watermarks are important
DAMON_RECLAIM uses watermarks so it can stay inactive when memory is plentiful, become active in a middle zone, and step back again when the system is under more serious pressure.
That last part is subtle and important.
The low watermark behavior means:
if the box is already too tight, let normal reclaim machinery take over instead of insisting that DAMON keep doing selective background cleanup.
That is good operational design.
Quotas are not optional in spirit
DAMON_RECLAIM provides:
- time quota (
quota_ms) - size quota (
quota_sz) - quota reset interval
- optional PSI-based memory-pressure targeting
- optional feedback-based auto-tuning
The right mental model is:
- quotas prevent background reclaim from becoming the new problem,
- PSI targeting helps shape effort based on observed stall pressure,
- and stats like
nr_quota_exceedstell you when your policy wants to do more than you are allowing.
If you enable DAMON_RECLAIM without thinking hard about quotas, you are skipping one of its main safety rails.
7) DAMON_LRU_SORT: fixing LRU ordering before reclaim happens
DAMON_LRU_SORT is different from proactive reclaim. It does not primarily reclaim pages itself. It tries to make the LRU lists better reflect real access behavior by:
- prioritizing hot regions, and
- deprioritizing cold regions.
This is interesting because reclaim quality depends heavily on whether the kernel’s page ordering is trustworthy.
If the access signal feeding the LRU lists is stale or misleading, reclaim decisions suffer. DAMON_LRU_SORT tries to improve the ordering upstream.
When I’d favor it
I would look at DAMON_LRU_SORT when:
- reclaim is happening, but it seems to pick victims badly,
- active/inactive balance looks wrong for the workload,
- or you want to improve reclaim readiness without immediately forcing more pageout.
Useful knobs
The kernel docs highlight knobs such as:
hot_thres_access_freqcold_min_ageactive_mem_bpfilter_young_pages- quota and watermark settings
- optional auto-tuned monitoring intervals
The important operational lesson is that DAMON_LRU_SORT is a steering mechanism, not a brute-force broom.
8) Observation-first rollout beats direct action-first rollout
This is the biggest practical advice.
Do not start by enabling automatic reclaim on a production-critical box just because DAMON exists.
Start in this order:
Stage 0 — Verify kernel capabilities
Confirm the system actually has the needed pieces:
CONFIG_DAMON_*CONFIG_DAMON_SYSFSif you want sysfs control- and the specific static modules for
DAMON_RECLAIM/DAMON_LRU_SORTif you plan to use them
Also confirm which operations sets are actually available on the running kernel.
Stage 1 — Observe only
Use DAMON or DAMO to inspect the workload without acting on it. You want to learn:
- working-set size percentiles,
- hot/cold region stability,
- and whether the default aggregation interval is wildly wrong for your workload.
The docs explicitly note that the default 100 ms aggregation interval is too short in many cases, especially on large systems. That is a very useful warning.
Stage 2 — Tune monitoring quality vs overhead
Adjust:
- aggregation interval,
- sampling interval,
- min/max regions,
- and target scope.
A good rule from the design docs is to set sampling interval roughly proportional to aggregation interval; the default ratio is around 1/20, and that is still the recommended baseline.
Stage 3 — Define “cold enough to touch”
Before any action, define your operating meaning of cold:
- not accessed for 30s?
- 120s?
- 10 minutes?
- only file-backed pages?
- never anonymous pages?
This is workload-dependent and nontrivial.
Stage 4 — Add tiny quotas
When you first test DAMON_RECLAIM or DAMOS pageout-like actions, set small quotas and conservative watermarks. The goal is to learn behavior, not win benchmarks on day one.
Stage 5 — Compare system outcomes, not just DAMON stats
Judge success using:
- PSI memory pressure,
- tail latency,
- swapin/swapout behavior,
- major fault rates,
- refault behavior,
- and reclaim noise during bursts.
The kernel feature is only useful if host behavior improves.
9) DAMON sysfs is powerful, but the model matters more than the path names
The sysfs interface is rich and configurable, but it is easy to get lost in the tree.
The shape that matters is:
- create a kdamond
- create a context
- choose operations (
vaddr,paddr, etc.) - set monitoring attributes
- define targets
- optionally define schemes (DAMOS)
- start / commit / refresh state
Useful pieces to remember:
sample_us,aggr_us,update_usnr_regions/min,nr_regions/maxintervals_goal/*for interval auto-tuningschemes/*for action, access pattern, quotas, watermarks, filters, and statsrefresh_msfor periodic updates of tuned parameters and scheme stats
If you need a human-friendly entry point, the official docs point to DAMO as the default userspace tool layered on top of sysfs.
10) Auto-tuning exists, but it is not permission to stop thinking
DAMON can auto-tune monitoring intervals. The design docs describe this in terms of targeting a desired ratio of observed access events to the theoretical maximum, using knobs such as:
access_bpaggrsmin_sample_usmax_sample_us
This is useful because interval tuning is otherwise tedious.
But the important operational framing is:
- auto-tuning helps find a better observation resolution,
- it does not discover your business SLOs,
- it does not know the cost of false-cold classification,
- and it does not validate whether your action policy is safe.
Treat it as a monitoring-parameter assistant, not an autopilot.
11) The most important caveats
A) Approximation error is inherent
DAMON relies on region assumptions and sampled access signals. That is fine, but it means bad thresholds can misclassify memory.
This is why “observe first” matters so much.
B) Accessed-bit interaction is real
The design docs note that DAMON uses PTE accessed-bit-based checks for virtual and physical address monitoring. That can interfere with other subsystems that care about the same signal.
The docs specifically say:
- DAMON does not avoid disturbing idle page tracking,
- so handling that interference is on the sysadmin,
- while conflict with reclaim logic is handled using
PG_idleandPG_young, similar to idle page tracking.
Translation: know what else on the box relies on page-idle-style signals before you assume all observability sources remain independent.
C) The default intervals may be wrong by a lot
This is not a minor footnote. A 100 ms aggregation interval can be far too short for many real systems, which makes the output look flatter and less informative than it should.
D) Overenthusiastic reclaim can backfire
If you reclaim memory that is merely temporarily cold, you may buy lower free-memory anxiety at the cost of:
- more refaults,
- more major faults,
- more swap churn,
- and worse tail latency.
A reclaim policy that “looks active” is not necessarily helping.
E) Whole-host tuning can hide workload asymmetry
On mixed-use machines, a single physical-memory policy can accidentally optimize for one workload while penalizing another.
Per-process observation with vaddr can reveal this before paddr-level policy obscures it.
12) When to use DAMON_RECLAIM vs DAMON_LRU_SORT
Use DAMON_RECLAIM when your problem is mainly:
- too much cold memory persisting,
- reclaim arriving too late and too violently,
- or wanting controlled, proactive cleanup under moderate pressure.
Use DAMON_LRU_SORT when your problem is mainly:
- reclaim victim selection seems poor,
- active/inactive list ordering does not reflect reality,
- or you want to improve the quality of later reclaim decisions.
Use observation-only DAMON first when:
- you do not yet know whether the issue is reclaim timing, reclaim quality, working-set size, or workload-local memory behavior.
These are complementary, not mutually exclusive, but I would almost always start with observation and only then decide which action layer fits.
13) Practical rollout checklist
Before production use, answer these:
Monitoring design
- Are you observing
vaddr,fvaddr, orpaddr? - Is aggregation interval long enough to capture meaningful variation?
- Are region bounds sensible for system size?
- Do the snapshots distinguish hot from cold, or is everything mush?
Policy design
- What exact age threshold defines “cold”?
- Are anonymous pages included or skipped?
- Which watermark band should activate the policy?
- What quotas cap CPU and I/O side effects?
Validation design
- Did PSI improve?
- Did refaults increase?
- Did swap traffic spike?
- Did tail latency improve during burst pressure?
- Did
nr_quota_exceedsreveal an over-constrained or underpowered policy?
Safety design
- What is the rollback path?
- How quickly can you disable the module or set
enabled=N? - Are you monitoring host-level regressions outside the memory subsystem itself?
14) Where I would reach for DAMON first
I would reach for DAMON first when I have a Linux host where memory pain is pattern-shaped rather than simply “we need more RAM.”
Examples:
- cache-heavy services with long-lived cold tails,
- dev or CI hosts that quietly accumulate stale working sets,
- systems where reclaim storms are rare but brutal,
- and experiments with tiered or NUMA-aware placement that need better access signals.
I would not reach for it first when the real problem is obviously:
- gross underprovisioning,
- memory leaks,
- pathological allocator behavior,
- or application-local cache policy bugs.
In those cases, DAMON may still help diagnose, but it is not the first fix.
Bottom line
DAMON is one of the more interesting Linux memory-management tools because it is neither hand-wavy observability nor blind automation. It gives you a structured way to ask:
- what is actually hot,
- what is cold enough to touch,
- and how aggressively should the kernel act on that information?
Its power comes from three things used together:
- approximate but useful access monitoring,
- bounded-overhead policy control,
- and operator-visible safeguards like quotas and watermarks.
The right mindset is not “turn on magic reclaim.” It is:
measure access shape, define cold conservatively, then act with quotas.
Used that way, DAMON is not just a curiosity from kernel docs. It becomes a real playbook tool for reducing reclaim chaos and making Linux memory behavior more intentional.
References
- Linux kernel documentation: DAMON overview
https://docs.kernel.org/admin-guide/mm/damon/ - Linux kernel documentation: Getting started with DAMON / DAMO
https://docs.kernel.org/admin-guide/mm/damon/start.html - Linux kernel documentation: Detailed DAMON usage / sysfs interface
https://docs.kernel.org/admin-guide/mm/damon/usage.html - Linux kernel documentation: DAMON design
https://docs.kernel.org/mm/damon/design.html - Linux kernel documentation: DAMON-based Reclamation
https://docs.kernel.org/admin-guide/mm/damon/reclaim.html - Linux kernel documentation: DAMON-based LRU-lists Sorting
https://docs.kernel.org/admin-guide/mm/damon/lru_sort.html