Linux DAMON Proactive Reclamation and Access-Aware Memory Operations Playbook

2026-04-10 · software

Linux DAMON Proactive Reclamation and Access-Aware Memory Operations Playbook

Date: 2026-04-10
Category: knowledge
Domain: software / linux / memory management

Why this matters

Linux memory tuning often swings between two extremes:

DAMON matters because it gives Linux a middle path:

The useful mental model is:

DAMON is Linux’s “watch memory access patterns first, act second” subsystem.

That makes it interesting for:

It is not a magic replacement for the VM subsystem. It is a way to inject better recency/frequency information into memory decisions.


1) Quick mental model

DAMON stands for Data Access MONitoring and Access-aware System Operations.

At a high level, it does four things:

  1. Observes a target address space.
  2. Approximates access behavior by tracking regions instead of every page all the time.
  3. Maintains age + access-frequency signals for those regions.
  4. Feeds that information into actions such as page reclamation, LRU reprioritization, or other DAMOS schemes.

The key design idea is that full page-by-page observation is too expensive at scale. So DAMON uses:

so overhead stays controlled while the picture remains useful.

That trade-off is the whole point.


2) What DAMON is good at

Strong fits

A) Proactive reclaim before the cliff

If a machine alternates between calm periods and sudden pressure spikes, classic reclaim can wait too long and then become noisy.

DAMON is good when you want to identify memory that has been cold for a meaningful time window and reclaim it gently before the cliff.

B) Making LRU behavior more reality-based

LRU is a useful approximation, but real workloads are messy. Pages that look inactive from one signal may still matter; pages that linger on active lists may be dead weight.

DAMON can improve this by identifying hot/cold regions and using that information to promote or demote pages on LRU lists more intelligently.

C) Observation-led memory tuning

Before changing reclaim strategy, swap behavior, cache sizes, or tiering policy, DAMON can help answer:

D) Research and platform engineering

DAMON is attractive for platform teams because it is both:

Weak fits

A) Instant, per-page, zero-error truth

DAMON is approximate by design. If you need exact accounting of every access, this is the wrong tool.

B) Tiny systems with simple pressure profiles

If your box is small and its behavior is obvious, the complexity may not buy much. Traditional VM tuning may be enough.

C) Severe pressure emergencies

DAMON-based reclaim is explicitly not meant to replace regular page-granularity reclaim under real pressure. It is for selective, lightweight, proactive work.

D) One-knob miracle tuning

DAMON gives you better information and better policy hooks. It does not spare you from threshold tuning, validation, or workload-specific judgment.


3) The architecture that actually matters to operators

The official docs describe three layers:

The operator version is simpler:

A) Operations sets: what address space are you watching?

DAMON supports multiple monitoring targets, including:

If you care about whole-host proactive reclaim behavior, paddr is usually the most relevant mental model. If you care about understanding one workload’s memory pattern, vaddr is often the starting point.

B) kdamond: who does the work?

Each DAMON context is executed by a kernel thread named kdamond. Think of it as the monitoring/control worker for one configured monitoring context.

C) DAMOS: what do you do with the signal?

DAMON-based Operation Schemes (DAMOS) let you define:

This is where DAMON turns from “interesting observability” into “memory policy.”


4) How DAMON keeps overhead under control

This is the most important design point.

DAMON does not monitor every page equally forever. It uses region-based sampling.

Core knobs

Why regions matter

Adjacent pages are grouped into regions that are assumed to have similar behavior. DAMON then checks representative pages and adaptively splits or merges regions based on observed access differences.

That means:

The clever bit is that DAMON tries to ride that trade-off automatically within the bounds you set.

Age is as important as frequency

Operators often focus only on “hotness,” but DAMON also tracks how long the current access pattern has persisted. That is valuable because:

In practice, cold age thresholds are usually one of the most important tuning levers.


5) The best way to think about monitoring output

A DAMON snapshot is basically telling you:

That is enough to estimate:

A good practical reading of DAMON data is not:

“this specific page is definitely cold.”

It is:

“this region has been consistently cold enough, long enough, that reclaiming or deprioritizing it is a defensible bet.”

That distinction matters.


6) DAMON_RECLAIM: where proactive reclaim fits

DAMON_RECLAIM is the operator-facing static kernel module for proactive reclamation.

The docs are very explicit about its role:

That is exactly how to use it.

When it shines

DAMON_RECLAIM is attractive when:

Why the watermarks are important

DAMON_RECLAIM uses watermarks so it can stay inactive when memory is plentiful, become active in a middle zone, and step back again when the system is under more serious pressure.

That last part is subtle and important.

The low watermark behavior means:

if the box is already too tight, let normal reclaim machinery take over instead of insisting that DAMON keep doing selective background cleanup.

That is good operational design.

Quotas are not optional in spirit

DAMON_RECLAIM provides:

The right mental model is:

If you enable DAMON_RECLAIM without thinking hard about quotas, you are skipping one of its main safety rails.


7) DAMON_LRU_SORT: fixing LRU ordering before reclaim happens

DAMON_LRU_SORT is different from proactive reclaim. It does not primarily reclaim pages itself. It tries to make the LRU lists better reflect real access behavior by:

This is interesting because reclaim quality depends heavily on whether the kernel’s page ordering is trustworthy.

If the access signal feeding the LRU lists is stale or misleading, reclaim decisions suffer. DAMON_LRU_SORT tries to improve the ordering upstream.

When I’d favor it

I would look at DAMON_LRU_SORT when:

Useful knobs

The kernel docs highlight knobs such as:

The important operational lesson is that DAMON_LRU_SORT is a steering mechanism, not a brute-force broom.


8) Observation-first rollout beats direct action-first rollout

This is the biggest practical advice.

Do not start by enabling automatic reclaim on a production-critical box just because DAMON exists.

Start in this order:

Stage 0 — Verify kernel capabilities

Confirm the system actually has the needed pieces:

Also confirm which operations sets are actually available on the running kernel.

Stage 1 — Observe only

Use DAMON or DAMO to inspect the workload without acting on it. You want to learn:

The docs explicitly note that the default 100 ms aggregation interval is too short in many cases, especially on large systems. That is a very useful warning.

Stage 2 — Tune monitoring quality vs overhead

Adjust:

A good rule from the design docs is to set sampling interval roughly proportional to aggregation interval; the default ratio is around 1/20, and that is still the recommended baseline.

Stage 3 — Define “cold enough to touch”

Before any action, define your operating meaning of cold:

This is workload-dependent and nontrivial.

Stage 4 — Add tiny quotas

When you first test DAMON_RECLAIM or DAMOS pageout-like actions, set small quotas and conservative watermarks. The goal is to learn behavior, not win benchmarks on day one.

Stage 5 — Compare system outcomes, not just DAMON stats

Judge success using:

The kernel feature is only useful if host behavior improves.


9) DAMON sysfs is powerful, but the model matters more than the path names

The sysfs interface is rich and configurable, but it is easy to get lost in the tree.

The shape that matters is:

  1. create a kdamond
  2. create a context
  3. choose operations (vaddr, paddr, etc.)
  4. set monitoring attributes
  5. define targets
  6. optionally define schemes (DAMOS)
  7. start / commit / refresh state

Useful pieces to remember:

If you need a human-friendly entry point, the official docs point to DAMO as the default userspace tool layered on top of sysfs.


10) Auto-tuning exists, but it is not permission to stop thinking

DAMON can auto-tune monitoring intervals. The design docs describe this in terms of targeting a desired ratio of observed access events to the theoretical maximum, using knobs such as:

This is useful because interval tuning is otherwise tedious.

But the important operational framing is:

Treat it as a monitoring-parameter assistant, not an autopilot.


11) The most important caveats

A) Approximation error is inherent

DAMON relies on region assumptions and sampled access signals. That is fine, but it means bad thresholds can misclassify memory.

This is why “observe first” matters so much.

B) Accessed-bit interaction is real

The design docs note that DAMON uses PTE accessed-bit-based checks for virtual and physical address monitoring. That can interfere with other subsystems that care about the same signal.

The docs specifically say:

Translation: know what else on the box relies on page-idle-style signals before you assume all observability sources remain independent.

C) The default intervals may be wrong by a lot

This is not a minor footnote. A 100 ms aggregation interval can be far too short for many real systems, which makes the output look flatter and less informative than it should.

D) Overenthusiastic reclaim can backfire

If you reclaim memory that is merely temporarily cold, you may buy lower free-memory anxiety at the cost of:

A reclaim policy that “looks active” is not necessarily helping.

E) Whole-host tuning can hide workload asymmetry

On mixed-use machines, a single physical-memory policy can accidentally optimize for one workload while penalizing another. Per-process observation with vaddr can reveal this before paddr-level policy obscures it.


12) When to use DAMON_RECLAIM vs DAMON_LRU_SORT

Use DAMON_RECLAIM when your problem is mainly:

Use DAMON_LRU_SORT when your problem is mainly:

Use observation-only DAMON first when:

These are complementary, not mutually exclusive, but I would almost always start with observation and only then decide which action layer fits.


13) Practical rollout checklist

Before production use, answer these:

Monitoring design

Policy design

Validation design

Safety design


14) Where I would reach for DAMON first

I would reach for DAMON first when I have a Linux host where memory pain is pattern-shaped rather than simply “we need more RAM.”

Examples:

I would not reach for it first when the real problem is obviously:

In those cases, DAMON may still help diagnose, but it is not the first fix.


Bottom line

DAMON is one of the more interesting Linux memory-management tools because it is neither hand-wavy observability nor blind automation. It gives you a structured way to ask:

Its power comes from three things used together:

The right mindset is not “turn on magic reclaim.” It is:

measure access shape, define cold conservatively, then act with quotas.

Used that way, DAMON is not just a curiosity from kernel docs. It becomes a real playbook tool for reducing reclaim chaos and making Linux memory behavior more intentional.


References