Pólya’s Urn: How Tiny Early Randomness Becomes Persistent Advantage (Field Guide)

One-line intuition

If success increases your chance of future success, early luck gets amplified and can lock in long-run outcomes.

The canonical process

Start an urn with:

(\alpha) red balls
(\beta) blue balls

At each step:

Draw one ball uniformly at random.
Return it.
Add one extra ball of the same color.

This is the simplest reinforcement process: draw -> reinforce -> repeat.

The predictive law (why reinforcement is explicit)

After (n) draws, if (k) were red:

[ \Pr(\text{next is red}) = \frac{\alpha + k}{\alpha + \beta + n} ]

So each red draw literally increases future red probability.

Two non-obvious consequences

1) Exchangeable, but not independent

The draw sequence is not i.i.d. (history matters), but it is exchangeable: order doesn’t matter, only counts do.

2) Random limit share (path dependence)

The red proportion converges almost surely to a random limit:

[ \frac{R_n}{R_n + B_n} \to \Theta, \quad \Theta \sim \mathrm{Beta}(\alpha,\beta) ]

So there is no single deterministic equilibrium share. Different runs end at different stable compositions because early noise is frozen into the system.

A famous special case

With (\alpha=\beta=1): after (n) draws, the number of red draws is uniform on ({0,1,\dots,n}).

Interpretation: in this symmetric start, extreme outcomes are not suppressed the way they are in a Binomial model. Reinforcement keeps extremes plausible.

Why this matters in real systems

Pólya-type dynamics appear whenever “popularity breeds popularity”:

adoption of standards/platforms (network effects)
citation and attention concentration
recommendation-feedback loops
liquidity clustering in markets
organizational habit lock-in

Core lesson: early variance is strategic, not just transient noise.

Design implications (operator view)

If you run a system with reinforcement:

Control the cold-start phase
- early ranking/exposure rules disproportionately shape long-run concentration.
Audit feedback loops
- recommendation and allocation rules can amplify random early advantages.
Add anti-lock-in mechanisms when needed
- exploration quotas, decay terms, rotation, or re-seeding can reduce runaway concentration.
Don’t over-interpret early winners
- in reinforced systems, “winner quality” and “winner luck” are entangled.

Minimal simulation sketch

import random

alpha, beta = 1, 1
red, blue = alpha, beta
n = 5000

for _ in range(n):
    if random.random() < red / (red + blue):
        red += 1
    else:
        blue += 1

print("final red share:", red / (red + blue))

Run many times: final shares vary widely, but each run stabilizes.

Related models

Dirichlet-multinomial (multi-color finite case)
Chinese Restaurant Process / Dirichlet Process (new-category creation + reinforcement)
Yule/Preferential attachment (degree/size reinforcement in growing networks)

References (starter set)

F. Eggenberger, G. Pólya (1923), Über die Statistik verketteter Vorgänge, ZAMM 3(4), 279–289. https://doi.org/10.1002/zamm.19230030407
D. Blackwell, J. B. MacQueen (1973), Ferguson Distributions via Pólya Urn Schemes, Annals of Statistics 1(2), 353–355. https://doi.org/10.1214/aos/1176342372
H. Mahmoud (2008), Pólya Urn Models, CRC Press.
W. B. Arthur (1989), Competing Technologies, Increasing Returns, and Lock-In by Historical Events, Economic Journal 99(394), 116–131. https://doi.org/10.2307/2234208