What is the Central Limit Theorem? Statement, Formula, and Sample Size | The Journal

The Central Limit Theorem (CLT) is the mathematical bedrock that explains why the normal distribution (Gaussian curve) appears ubiquitously across natural and social sciences. At its core, the theorem dictates that the sum (or average) of a sufficiently large number of independent and identically distributed (i.i.d.) random variables will approximate a normal distribution, regardless of the underlying distribution of the original variables.

From the heights of individuals in a population to the measurement errors in astronomy, the CLT governs the predictable aggregation of random microscopic fluctuations into macroscopic stability.

I. The Formal Statement and Formula

Let $X_1, X_2, \dots, X_n$ be a sequence of independent and identically distributed (i.i.d.) random variables drawn from a population with a defined expected value $\mathbb{E}[X_i] = \mu$ and a finite variance $\text{Var}(X_i) = \sigma^2$ .

The sample mean of these $n$ variables is defined as:

$\bar{X}_n = \frac{X_1 + X_2 + \dots + X_n}{n} = \frac{1}{n} \sum_{i=1}^n X_i$

According to the Weak Law of Large Numbers (WLLN), $\bar{X}_n$ converges in probability to $\mu$ as $n \to \infty$ . However, the Central Limit Theorem goes a crucial step further by describing the distribution of the fluctuations around this mean. To analyze these fluctuations without them collapsing to zero or exploding to infinity, we must scale them precisely by $\sqrt{n}$ . The standardized sample mean, $Z_n$ , is formulated as:

$Z_n = \frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} = \frac{\sum_{i=1}^n X_i - n\mu}{\sigma \sqrt{n}}$

The Lindeberg-Lévy Central Limit Theorem states that as the sample size $n$ approaches infinity, the sequence of random variables $Z_n$ converges in distribution to the standard normal distribution $\mathcal{N}(0, 1)$ :

$Z_n \xrightarrow{d} \mathcal{N}(0, 1)$

This implies that for a sufficiently large $n$ , the cumulative distribution function (CDF) of $Z_n$ approaches the CDF of the standard normal distribution $\Phi(z)$ :

$\lim_{n \to \infty} P(Z_n \le z) = \Phi(z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z} e^{-t^2/2} \, dt$

II. Visualizing the Convergence: The Path to Normality

To understand this intuitively, imagine starting with a completely flat (uniform) distribution where every outcome is equally likely. As you increase the sample size ( $n$ ), the act of averaging pulls the extremes toward the center because it becomes exponentially rarer to draw extreme values consecutively (e.g., rolling five 6s in a row on a die).

Use the interactive figure below to observe how repeatedly averaging uniform variables produces the characteristic bell curve. Notice how the variance tightens as the sample size increases.

III. Proof Intuition via Characteristic Functions

While a rigorous proof requires measure theory and Levy's Continuity Theorem, the classical proof of the CLT relies on characteristic functions (the Fourier transforms of probability distributions).

The characteristic function of a random variable $X$ is defined as $\varphi_X(t) = \mathbb{E}[e^{itX}]$ . For our standardized variable $Y_i = \frac{X_i - \mu}{\sigma}$ (which has mean 0 and variance 1), we can expand its characteristic function using a Taylor series around $t=0$ :

$\varphi_Y(t) = 1 + it\mathbb{E}[Y] - \frac{t^2}{2}\mathbb{E}[Y^2] + o(t^2)$

Since $\mathbb{E}[Y] = 0$ and $\mathbb{E}[Y^2] = 1$ , this simplifies elegantly to:

$\varphi_Y(t) = 1 - \frac{t^2}{2} + o(t^2)$

The characteristic function of the standardized sum $Z_n = \frac{1}{\sqrt{n}} \sum_{i=1}^n Y_i$ leverages the core property that the characteristic function of a sum of independent variables is the product of their individual characteristic functions:

$\varphi_{Z_n}(t) = \prod_{i=1}^n \varphi_Y\left(\frac{t}{\sqrt{n}}\right) = \left( \varphi_Y\left(\frac{t}{\sqrt{n}}\right) \right)^n$

Substituting our Taylor expansion:

$\varphi_{Z_n}(t) = \left( 1 - \frac{1}{2}\left(\frac{t}{\sqrt{n}}\right)^2 + o\left(\frac{t^2}{n}\right) \right)^n = \left( 1 - \frac{t^2}{2n} + o\left(\frac{t^2}{n}\right) \right)^n$

Taking the limit as $n \to \infty$ , we encounter the classic definition of the exponential function $\lim_{n \to \infty} \left(1 + \frac{x}{n}\right)^n = e^x$ :

$\lim_{n \to \infty} \varphi_{Z_n}(t) = e^{-t^2 / 2}$

This limiting characteristic function is exactly that of the standard normal distribution $\mathcal{N}(0,1)$ , confirming convergence in distribution.

IV. Beyond i.i.d.: The Lindeberg and Lyapunov Conditions

The standard CLT assumes variables are identically distributed, but the real world is messy. Can the CLT still hold if the variables $X_i$ come from different distributions? Yes, provided they satisfy certain conditions that prevent any single variable from dominating the sum.

The Lyapunov Condition

If the variables $X_i$ are independent (but not necessarily identical) with means $\mu_i$ and variances $\sigma_i^2$ , let $s_n^2 = \sum_{i=1}^n \sigma_i^2$ . The Lyapunov condition requires that for some $\delta > 0$ :

$\lim_{n \to \infty} \frac{1}{s_n^{2+\delta}} \sum_{i=1}^n \mathbb{E}[|X_i - \mu_i|^{2+\delta}] = 0$

This ensures that the third (or higher) absolute moments do not grow too fast relative to the variance.

The Lindeberg Condition

A weaker, more general condition is the Lindeberg condition, which states that for any $\epsilon > 0$ :

$\lim_{n \to \infty} \frac{1}{s_n^2} \sum_{i=1}^n \mathbb{E}\left[ (X_i - \mu_i)^2 \cdot \mathbb{I}_{\{|X_i - \mu_i| > \epsilon s_n\}} \right] = 0$

Where $\mathbb{I}$ is the indicator function. In plain English: the contribution to the total variance from the extreme tails of the individual distributions must vanish as $n$ grows. If this holds, the sum still converges to a Gaussian.

V. The Berry-Esseen Theorem and Convergence Rates

A common practical question in applied statistics is: How large must the sample size be for the Central Limit Theorem to apply?

While textbooks often cite $n \ge 30$ as a rule of thumb, this heuristic is highly fragile. It depends entirely on the skewness of the underlying population distribution. The Berry-Esseen Theorem provides a rigorous upper bound on the approximation error between the true CDF $F_n(x)$ and the standard normal CDF $\Phi(x)$ :

$\sup_{x \in \mathbb{R}} \left| F_n(x) - \Phi(x) \right| \le \frac{C \cdot \rho}{\sigma^3 \sqrt{n}}$

Where $\rho = \mathbb{E}[|X - \mu|^3]$ is the third absolute central moment (related to skewness), and $C$ is a constant ( $C \approx 0.4748$ ).

Practical Implications of Berry-Esseen

→Symmetric Distributions: If the original population is uniform or vague symmetric (low $\rho$ ), the error drops rapidly, and $n = 10$ might be perfectly sufficient.
→Highly Skewed Distributions: If the underlying population follows an exponential, log-normal, or Pareto distribution (e.g., wealth distribution, server queue times), $\rho$ is massive. The rate of convergence is severely hindered, and $n$ may need to exceed 200 or 500 for a valid normal approximation.

VI. Institutional Applications: Quantitative Finance

In Quantitative Finance, the Central Limit Theorem is the engine behind the Geometric Brownian Motion (GBM) used in the Black-Scholes-Merton model. By assuming that the daily returns of an asset are independent and identically distributed, the sum of these logarithmic returns over a long period converges to a normal distribution:

$\ln\left(\frac{S_T}{S_0}\right) = \sum_{i=1}^n \ln(1 + r_i) \sim \mathcal{N}\left( \left(\mu - \frac{\sigma^2}{2}\right)T, \sigma^2 T \right)$

This log-normality of asset prices is a direct macroscopic consequence of the Central Limit Theorem acting on microscopic, tick-by-tick trading noise.

VII. Real-World Scenario: Insurance Risk Pooling & Ruin Theory

Perhaps the most direct, life-altering application of the Central Limit Theorem is the foundation of the modern insurance industry.

Consider an auto insurance company that insures $n$ independent drivers. For any individual driver, the payout $X_i$ in a given year is highly unpredictable—most drivers will cost the company nothing ( $X_i = 0$ ), but a few will have catastrophic accidents resulting in massive payouts (e.g., $X_i =$ 100,000). The distribution of individual claims is extremely skewed.

If the company only insured 10 drivers, the variance would be so high that a single accident could bankrupt the firm. This is known as ruin theory.

However, by scaling up to $n = 1,000,000$ independent drivers, the Central Limit Theorem takes over. Let $\mu$ be the expected expected claim per driver and $\sigma^2$ be the variance. According to the CLT, the total aggregate claim $S_n = \sum_{i=1}^n X_i$ will closely approximate a normal distribution:

$S_n \sim \mathcal{N}(n\mu, n\sigma^2)$

Because the standard deviation of the total claim grows at a rate of $\sqrt{n}$ , while the total premium revenue (if they charge $\mu + \text{margin}$ per driver) grows at a rate of $n$ , the relative risk $\frac{\text{Standard Deviation}}{\text{Revenue}}$ drops proportionally to $\frac{1}{\sqrt{n}}$ .

This mathematical certainty is what allows insurance companies to guarantee payouts for catastrophic individual events without holding trillions of dollars in liquid reserves. The CLT transforms the chaotic, skewed risk of individuals into a tightly predictable Gaussian bell curve for the institution.