Return

What is the Central Limit Theorem? Statement, Formula, and Sample Size

The Central Limit Theorem (CLT) is the mathematical bedrock that explains why the normal distribution (Gaussian curve) appears ubiquitously across natural and social sciences. At its core, the theorem dictates that the sum (or average) of a sufficiently large number of independent and identically distributed (i.i.d.) random variables will approximate a normal distribution, regardless of the underlying distribution of the original variables.

From the heights of individuals in a population to the measurement errors in astronomy, the CLT governs the predictable aggregation of random microscopic fluctuations into macroscopic stability.

I. The Formal Statement and Formula

Let X1,X2,,XnX_1, X_2, \dots, X_n be a sequence of independent and identically distributed (i.i.d.) random variables drawn from a population with a defined expected value E[Xi]=μ\mathbb{E}[X_i] = \mu and a finite variance Var(Xi)=σ2\text{Var}(X_i) = \sigma^2.

The sample mean of these nn variables is defined as:

Xˉn=X1+X2++Xnn=1ni=1nXi\bar{X}_n = \frac{X_1 + X_2 + \dots + X_n}{n} = \frac{1}{n} \sum_{i=1}^n X_i

According to the Weak Law of Large Numbers (WLLN), Xˉn\bar{X}_n converges in probability to μ\mu as nn \to \infty. However, the Central Limit Theorem goes a crucial step further by describing the distribution of the fluctuations around this mean. To analyze these fluctuations without them collapsing to zero or exploding to infinity, we must scale them precisely by n\sqrt{n}. The standardized sample mean, ZnZ_n, is formulated as:

Zn=Xˉnμσ/n=i=1nXinμσnZ_n = \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} = \frac{\sum_{i=1}^n X_i - n\mu}{\sigma \sqrt{n}}

The Lindeberg-Lévy Central Limit Theorem states that as the sample size nn approaches infinity, the sequence of random variables ZnZ_n converges in distribution to the standard normal distribution N(0,1)\mathcal{N}(0, 1):

ZndN(0,1)Z_n \xrightarrow{d} \mathcal{N}(0, 1)

This implies that for a sufficiently large nn, the cumulative distribution function (CDF) of ZnZ_n approaches the CDF of the standard normal distribution Φ(z)\Phi(z):

limnP(Znz)=Φ(z)=12πzet2/2dt\lim_{n \to \infty} P(Z_n \le z) = \Phi(z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z} e^{-t^2/2} \, dt

II. Visualizing the Convergence: The Path to Normality

To understand this intuitively, imagine starting with a completely flat (uniform) distribution where every outcome is equally likely. As you increase the sample size (nn), the act of averaging pulls the extremes toward the center because it becomes exponentially rarer to draw extreme values consecutively (e.g., rolling five 6s in a row on a die).

Use the interactive figure below to observe how repeatedly averaging uniform variables produces the characteristic bell curve. Notice how the variance tightens as the sample size increases.

III. Proof Intuition via Characteristic Functions

While a rigorous proof requires measure theory and Levy's Continuity Theorem, the classical proof of the CLT relies on characteristic functions (the Fourier transforms of probability distributions).

The characteristic function of a random variable XX is defined as φX(t)=E[eitX]\varphi_X(t) = \mathbb{E}[e^{itX}]. For our standardized variable Yi=XiμσY_i = \frac{X_i - \mu}{\sigma} (which has mean 0 and variance 1), we can expand its characteristic function using a Taylor series around t=0t=0:

φY(t)=1+itE[Y]t22E[Y2]+o(t2)\varphi_Y(t) = 1 + it\mathbb{E}[Y] - \frac{t^2}{2}\mathbb{E}[Y^2] + o(t^2)

Since E[Y]=0\mathbb{E}[Y] = 0 and E[Y2]=1\mathbb{E}[Y^2] = 1, this simplifies elegantly to:

φY(t)=1t22+o(t2)\varphi_Y(t) = 1 - \frac{t^2}{2} + o(t^2)

The characteristic function of the standardized sum Zn=1ni=1nYiZ_n = \frac{1}{\sqrt{n}} \sum_{i=1}^n Y_i leverages the core property that the characteristic function of a sum of independent variables is the product of their individual characteristic functions:

φZn(t)=i=1nφY(tn)=(φY(tn))n\varphi_{Z_n}(t) = \prod_{i=1}^n \varphi_Y\left(\frac{t}{\sqrt{n}}\right) = \left( \varphi_Y\left(\frac{t}{\sqrt{n}}\right) \right)^n

Substituting our Taylor expansion:

φZn(t)=(112(tn)2+o(t2n))n=(1t22n+o(t2n))n\varphi_{Z_n}(t) = \left( 1 - \frac{1}{2}\left(\frac{t}{\sqrt{n}}\right)^2 + o\left(\frac{t^2}{n}\right) \right)^n = \left( 1 - \frac{t^2}{2n} + o\left(\frac{t^2}{n}\right) \right)^n

Taking the limit as nn \to \infty, we encounter the classic definition of the exponential function limn(1+xn)n=ex\lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n = e^x:

limnφZn(t)=et2/2\lim_{n \to \infty} \varphi_{Z_n}(t) = e^{-t^2 / 2}

This limiting characteristic function is exactly that of the standard normal distribution N(0,1)\mathcal{N}(0,1), confirming convergence in distribution.

IV. Beyond i.i.d.: The Lindeberg and Lyapunov Conditions

The standard CLT assumes variables are identically distributed, but the real world is messy. Can the CLT still hold if the variables XiX_i come from different distributions? Yes, provided they satisfy certain conditions that prevent any single variable from dominating the sum.

The Lyapunov Condition

If the variables XiX_i are independent (but not necessarily identical) with means μi\mu_i and variances σi2\sigma_i^2, let sn2=i=1nσi2s_n^2 = \sum_{i=1}^n \sigma_i^2. The Lyapunov condition requires that for some δ>0\delta > 0:

limn1sn2+δi=1nE[Xiμi2+δ]=0\lim_{n \to \infty} \frac{1}{s_n^{2+\delta}} \sum_{i=1}^n \mathbb{E}[|X_i - \mu_i|^{2+\delta}] = 0

This ensures that the third (or higher) absolute moments do not grow too fast relative to the variance.

The Lindeberg Condition

A weaker, more general condition is the Lindeberg condition, which states that for any ϵ>0\epsilon > 0:

limn1sn2i=1nE[(Xiμi)2I{Xiμi>ϵsn}]=0\lim_{n \to \infty} \frac{1}{s_n^2} \sum_{i=1}^n \mathbb{E}\left[ (X_i - \mu_i)^2 \cdot \mathbb{I}_{\{|X_i - \mu_i| > \epsilon s_n\}} \right] = 0

Where I\mathbb{I} is the indicator function. In plain English: the contribution to the total variance from the extreme tails of the individual distributions must vanish as nn grows. If this holds, the sum still converges to a Gaussian.

V. The Berry-Esseen Theorem and Convergence Rates

A common practical question in applied statistics is: How large must the sample size be for the Central Limit Theorem to apply?

While textbooks often cite n30n \ge 30 as a rule of thumb, this heuristic is highly fragile. It depends entirely on the skewness of the underlying population distribution. The Berry-Esseen Theorem provides a rigorous upper bound on the approximation error between the true CDF Fn(x)F_n(x) and the standard normal CDF Φ(x)\Phi(x):

supxRFn(x)Φ(x)Cρσ3n\sup_{x \in \mathbb{R}} \left| F_n(x) - \Phi(x) \right| \le \frac{C \cdot \rho}{\sigma^3 \sqrt{n}}

Where ρ=E[Xμ3]\rho = \mathbb{E}[|X - \mu|^3] is the third absolute central moment (related to skewness), and CC is a constant (C0.4748C \approx 0.4748).

Practical Implications of Berry-Esseen

  • Symmetric Distributions: If the original population is uniform or vague symmetric (low ρ\rho), the error drops rapidly, and n=10n = 10 might be perfectly sufficient.
  • Highly Skewed Distributions: If the underlying population follows an exponential, log-normal, or Pareto distribution (e.g., wealth distribution, server queue times), ρ\rho is massive. The rate of convergence is severely hindered, and nn may need to exceed 200 or 500 for a valid normal approximation.

VI. Institutional Applications: Quantitative Finance

In Quantitative Finance, the Central Limit Theorem is the engine behind the Geometric Brownian Motion (GBM) used in the Black-Scholes-Merton model. By assuming that the daily returns of an asset are independent and identically distributed, the sum of these logarithmic returns over a long period converges to a normal distribution:

ln(STS0)=i=1nln(1+ri)N((μσ22)T,σ2T)\ln\left(\frac{S_T}{S_0}\right) = \sum_{i=1}^n \ln(1 + r_i) \sim \mathcal{N}\left( \left(\mu - \frac{\sigma^2}{2}\right)T, \sigma^2 T \right)

This log-normality of asset prices is a direct macroscopic consequence of the Central Limit Theorem acting on microscopic, tick-by-tick trading noise.

VII. Real-World Scenario: Insurance Risk Pooling & Ruin Theory

Perhaps the most direct, life-altering application of the Central Limit Theorem is the foundation of the modern insurance industry.

Consider an auto insurance company that insures nn independent drivers. For any individual driver, the payout XiX_i in a given year is highly unpredictable—most drivers will cost the company nothing (Xi=0X_i = 0), but a few will have catastrophic accidents resulting in massive payouts (e.g., Xi=X_i = 100,000). The distribution of individual claims is extremely skewed.

If the company only insured 10 drivers, the variance would be so high that a single accident could bankrupt the firm. This is known as ruin theory.

However, by scaling up to n=1,000,000n = 1,000,000 independent drivers, the Central Limit Theorem takes over. Let μ\mu be the expected expected claim per driver and σ2\sigma^2 be the variance. According to the CLT, the total aggregate claim Sn=i=1nXiS_n = \sum_{i=1}^n X_i will closely approximate a normal distribution:

SnN(nμ,nσ2)S_n \sim \mathcal{N}(n\mu, n\sigma^2)

Because the standard deviation of the total claim grows at a rate of n\sqrt{n}, while the total premium revenue (if they charge μ+margin\mu + \text{margin} per driver) grows at a rate of nn, the relative risk Standard DeviationRevenue\frac{\text{Standard Deviation}}{\text{Revenue}} drops proportionally to 1n\frac{1}{\sqrt{n}}.

This mathematical certainty is what allows insurance companies to guarantee payouts for catastrophic individual events without holding trillions of dollars in liquid reserves. The CLT transforms the chaotic, skewed risk of individuals into a tightly predictable Gaussian bell curve for the institution.

Institutional Proof

Dive deeper into Statistical Inference I

See the complete formal proof, animated visual derivations, and the full architectural breakdown in the library.

Enter the Library →

The Journal

Subscribe for bi-weekly deep dives into abstract mathematics and statistical inference.