The Asymptotic Normality of Maximum Likelihood Estimators

Q: Why do we need the sqrt(n) scaling factor?

Without $ \sqrt{n} $, the variance of $ \hat{\theta}_n $ shrinks to zero as $ n $ grows. The $ \sqrt{n} $ term 'blows up' the distribution just enough to keep the variance constant, allowing us to see the stable Gaussian shape.

Q: What happens if the regularity conditions are violated?

If the support of the distribution depends on $ \theta $ (like a Uniform distribution), the MLE may converge faster than $ \sqrt{n} $ and the limiting distribution may not be Normal at all.

Analytical Intuition.

Imagine the log-likelihood function as a shifting, noisy mountain range where the peak represents our best guess,

\hat{\theta}_n

. As we collect more data, the Law of Large Numbers ensures this mountain sharpens and centers itself over the 'true' summit

\theta_0

. To prove normality, we zoom into the peak. We perform a Taylor expansion of the Score function—the derivative of the log-likelihood—around the truth. This reveals a beautiful tension between two forces: the numerator becomes a sum of independent fluctuations which, by the Central Limit Theorem, converges to a majestic Gaussian curve; the denominator, the second derivative or 'curvature' of the mountain, converges to the Fisher Information. This curvature acts as a gravitational constant, scaling our uncertainty. In the limit, the 'wobble' of our estimator is not random chaos, but a perfectly symmetric Normal distribution. This result is the 'Central Limit Theorem for Estimators,' providing the theoretical bedrock for constructing confidence intervals and conducting hypothesis tests in nearly all of modern frequentist statistics.

Institutional Warning.

Students often fail to distinguish between the 'observed' information and 'expected' Fisher information. The proof relies on the fact that the second derivative of the log-likelihood converges to the expected information via the Weak Law of Large Numbers, which then scales the Gaussian noise of the Score.

Academic Inquiries.

Why do we need the sqrt(n) scaling factor?

Without $\sqrt{n}$ , the variance of $\hat{\theta}_n$ shrinks to zero as $n$ grows. The $\sqrt{n}$ term 'blows up' the distribution just enough to keep the variance constant, allowing us to see the stable Gaussian shape.

What happens if the regularity conditions are violated?

If the support of the distribution depends on $\theta$ (like a Uniform distribution), the MLE may converge faster than $\sqrt{n}$ and the limiting distribution may not be Normal at all.

Is the asymptotic variance always the smallest possible?

Yes, under these conditions, the MLE is 'Asymptotically Efficient,' meaning its variance reaches the Cramer-Rao Lower Bound as $n$ approaches infinity.

NICEFA Visual Mathematics. (2026). The Asymptotic Normality of Maximum Likelihood Estimators: Visual Proof & Intuition. Retrieved from https://nicefa.org/library/applied-statistics/the-conceptual-proof-of-the-asymptotic-normality-of-maximum-likelihood-estimators

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

Why do we need the sqrt(n) scaling factor?

What happens if the regularity conditions are violated?

Is the asymptotic variance always the smallest possible?

Standardized References.

Proof of Chebyshev's Inequality

Derivation of the Mean and Variance of the Binomial Distribution

Derivation of the Mean and Variance of the Poisson Distribution

The Conceptual Proof of the Central Limit Theorem (CLT)

Institutional Citation

Dominate the Logic.

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

Why do we need the sqrt(n) scaling factor?

What happens if the regularity conditions are violated?

Is the asymptotic variance always the smallest possible?

Standardized References.

Related Proofs Cluster.

Proof of Chebyshev's Inequality

Derivation of the Mean and Variance of the Binomial Distribution

Derivation of the Mean and Variance of the Poisson Distribution

The Conceptual Proof of the Central Limit Theorem (CLT)

Institutional Citation

Dominate the Logic.