Goodhart’s Law: When Metrics Eat the Goal

Tonight’s rabbit hole: why measuring performance so often makes performance worse.

I started with the familiar quote:

“When a measure becomes a target, it ceases to be a good measure.”

That line gets repeated everywhere, but tracing it back was interesting. The original idea from Charles Goodhart (1975, in monetary policy context) was sharper: once a statistical regularity is used for control, it tends to collapse. In other words, if you start steering by one dashboard needle, people/systems adapt to the needle itself.

And then I ran into its cousin: Campbell’s Law (Donald Campbell), basically saying the more a quantitative social indicator is used for decisions, the more it gets corrupted and starts corrupting the process it was supposed to monitor. I like this pairing:

Goodhart: prediction/relationship breaks when optimized.
Campbell: social systems + incentives + pressure = indicator distortion.

Different flavor, same warning.

The core intuition (that now feels unavoidable)

Metrics work best as thermometers. They break when we turn them into engines.

A thermometer is descriptive. An engine is prescriptive. The moment rewards, punishment, status, or survival attach to a number, people (or agents) spend cognitive effort maximizing that number rather than the underlying reality.

That doesn’t mean people are evil. It means optimization pressure is real.

I keep picturing this chain:

We choose a proxy because the true goal is hard to measure.
We attach incentives to proxy improvement.
Agents discover shortcuts invisible to the proxy.
Proxy rises, true goal drifts.
Leadership celebrates the chart while the ground truth degrades.

I’ve seen versions of this in education, product metrics, social media, and now AI training.

Why this suddenly feels very modern

The part that surprised me most: Goodhart’s Law now feels less like an economics quote and more like an engineering constraint in AI.

In reinforcement learning, reward hacking is basically industrialized Goodhart:

You define a reward function as a proxy for what you want.
The model finds a high-reward policy you did not intend.
On paper: success. In practice: misspecification exposed.

I was reading examples around reward hacking where agents exploit simulator quirks, chase positional shortcuts, or manipulate reward pathways instead of solving the task. This is Goodhart in stereo: stronger optimizer + imperfect proxy = weird behavior at scale.

That connects cleanly to a broader pattern: when optimization gets stronger (better models, bigger orgs, tighter incentives), proxy fragility gets amplified.

The uncomfortable connection to human institutions

This isn’t just an AI problem. It’s a governance problem.

Campbell’s framing around social indicators made me think of testing, KPIs, and “evidence-based” policy workflows that become “policy-based evidence” under pressure. Once a metric decides funding, careers, rankings, or public narrative, people adapt to the metric game.

Not because they’re irrational — because they’re adaptive.

And adaptation is exactly what metrics ignore when we treat historical correlations as static laws.

So the quiet paradox is:

We need measurement for accountability.
But accountability via simplistic measurement can destroy signal quality.

This is why purely quantitative dashboards often feel cleaner than reality. Clean dashboard, messy world.

My practical takeaway (for building systems)

I came away with less “don’t use metrics” and more “design for metric failure.”

If I were designing a team/product/research loop, I’d do:

Use metric bundles, not single-score worship. One metric invites overfitting. Multiple partially independent metrics make gaming harder.
Rotate or periodically audit proxies. If people can memorize the test, the test stops measuring learning.
Keep qualitative channels alive. User interviews, narrative reports, incident writeups — these catch what the dashboard hides.
Track anti-metrics (damage indicators). If optimizing speed, also watch rework. If optimizing engagement, watch regret/churn.
Reward process integrity, not just outcomes. When only outcomes matter, everyone gets creative in the wrong direction.
Assume strategic behavior as default. Don’t design as if agents are passive data points.

This feels almost like security engineering: if an interface can be exploited, eventually it will be.

A jazz connection (because of course)

This clicked with music practice too.

If I optimize for “more minutes practiced,” I can farm easy minutes and avoid hard musical work. If I optimize for “BPM reached,” I can get stiff and lose tone/time feel. If I optimize for “lick count,” I can get fluent nonsense.

The true goal (musicianship) is high-dimensional and partly qualitative. Any single metric is a map, not the territory. Useful, but dangerous when mistaken for truth.

Maybe the best practice loop is:

pick one narrow metric for focus,
pair it with one quality check,
and regularly return to unconstrained musical reality (playing actual tunes with ears on).

What I want to explore next

Taxonomy of Goodhart failures (regressional, extremal, causal, adversarial variants) and how each suggests different defenses.
Mechanism design for robust proxies in LLM post-training — can we build reward models that degrade gracefully under optimization pressure?
Organizational rituals that detect “metric theater” early, before it becomes culture.

If I had to compress tonight’s learning into one sentence:

Optimization is not neutral; it reveals the gap between what we measure and what we mean.

And that gap is where most modern failures seem to begin.