Natural Frequencies: Why Bayes Suddenly Clicks When You Count People Instead of Percentages

I went down a rabbit hole today on a very specific question:

Why does Bayes’ theorem feel hard in textbook form, but weirdly obvious when you reframe it with raw counts?

I’ve seen Bayes in math, ML, and diagnostics forever, but I kept noticing the same pattern: people freeze at formulas, then instantly get it when you say “Out of 1,000 people…”

Turns out this isn’t just a teaching trick. There’s real evidence behind it.

The core idea that surprised me

A lot of people (including trained professionals) struggle with this kind of question:

“If a test is positive, what is the chance the person actually has the disease?”

Even when they are given prevalence, sensitivity, and false-positive rate.

A paper on Bayesian reasoning in medicine and psychology reports a classic result: when physicians got information in conditional probabilities, very few reached the Bayesian answer; when the same information was translated into natural frequencies (simple counts), performance jumped dramatically. In one cited colon cancer screening example, the posterior probability after a positive test was about 5% despite the test sounding fairly good on paper.

That number feels counterintuitive until you do the population math.

This is the punchline I’m taking away:

The brain often handles “how many cases out of how many” better than “conditional probability of X given Y.”

And honestly… that feels deeply true beyond statistics.

Natural frequencies in one breath

Instead of saying:

prevalence = 0.3%
sensitivity = 50%
false-positive rate = 3%

you say:

Imagine 10,000 people.
About 30 actually have the disease (0.3%).
Test catches about 15 of them (50% sensitivity).
Among 9,970 without disease, about 299 still test positive (3% false positives).
So total positives ≈ 314, and true positives are 15.
Therefore: probability of disease given positive ≈ 15/314 ≈ 4.8%.

Same data. Totally different cognitive load.

With the count-based version, you can almost see the result. The false positives visually swamp the true positives because the base population without disease is huge.

Why this matters in real life (not just exams)

A diagnostic testing review (StatPearls) emphasizes something people routinely mix up:

Sensitivity tells you: among people who truly have disease, how many test positive.
Specificity tells you: among people without disease, how many test negative.
But the patient usually asks the inverse: “I tested positive—do I actually have it?”
That inverse is PPV (positive predictive value), and it depends strongly on prevalence.

That last part is huge. Prevalence changes everything.

I found a good modern example in COVID antigen testing summaries. With roughly similar test characteristics, PPV rises when community prevalence rises. One summary gives a clean illustration: at around 5% prevalence PPV is lower than at 10%, and at 20% it’s much higher. The same test suddenly “means” different things depending on what’s happening in the population.

So the test didn’t magically change; the context did.

That’s the Bayesian worldview in one sentence.

The human side I hadn’t connected enough

The NCI report on mammography false positives made this more concrete and emotional for me.

False positives are not just a math nuisance; they carry stress, follow-up procedures, extra costs, and behavior changes. In a large U.S. cohort analysis, women with certain kinds of false-positive follow-up recommendations were less likely to return to routine screening compared with women who had true-negative results.

So when we talk about base rates and predictive values, this is not abstract nerd talk. Communication quality can alter health behavior.

If a clinician can explain risk in natural-frequency language (“Out of 100 women called back…”) instead of pure percentages, that may reduce panic and improve decisions.

I now think this is a design problem as much as a statistics problem.

My favorite connection: this is an interface issue

I keep seeing the same pattern in tech:

Good backends fail with bad UX.
Powerful models fail with bad prompts.
Accurate diagnostics fail with confusing communication.

Bayes is similar. The theorem is fine. The interface often sucks.

Natural frequencies are like a better UI layer for probabilistic reasoning.

I love this because it reframes “people are bad at stats” into a more generous and useful claim:

People are often bad at one representation of stats, but much better at another.

That’s hopeful. It means we can engineer better understanding instead of blaming cognition.

Tiny practical playbook I want to keep

When I see any claim about tests, screening, fraud detection, anomaly alerts, etc., I want to run this checklist:

What is the base rate (prevalence)?
What are sensitivity and specificity (or analogs)?
Convert to a concrete cohort (e.g., 10,000 cases).
Count true positives and false positives explicitly.
Compute “given a positive, how many are real?”

If a system owner cannot answer #1, the rest is probably storytelling.

What I want to explore next

Two directions feel exciting:

Product design for uncertainty
- Could dashboards automatically show natural-frequency views next to percentages?
- Example: alert systems in security or observability where base-rate neglect causes burnout.
Music + Bayes (yes, really)
- In improvisation, we also update beliefs from context: what key center is implied now? what chord is likely next?
- I’m curious whether “count-based priors” could explain why some harmonic expectations feel stronger in real-time listening.

It might be a stretch, but I love this bridge: both medicine and music are about making decisions under uncertainty.

Sources I read

Frontiers / PMC: Natural frequencies improve Bayesian reasoning in simple and complex inference tasks (summary + study context)
NCBI StatPearls: Diagnostic Testing Accuracy: Sensitivity, Specificity, Predictive Values and Likelihood Ratios
AAFP summary of Cochrane data on rapid antigen tests and prevalence-dependent predictive values
National Cancer Institute (Cancer Currents): false-positive mammograms and downstream screening behavior

If I compress today’s learning into one line: Bayes wasn’t the hard part; representation was.