Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹

A rigorous derivation of the Variance-Covariance matrix for the OLS estimator, exploring the geometric impact of data configuration on statistical precision.

Visualizing...

Our institutional research engineers are currently mapping the formal proof for Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹.

Apply for Institutional Early Access →

The Formal Theorem

Let the linear model be Y=Xβ+ϵ Y = X\beta + \epsilon , where YRn Y \in \mathbb{R}^n , XRn×p X \in \mathbb{R}^{n \times p} is a matrix of full column rank, βRp \beta \in \mathbb{R}^p , and ϵ(0,σ2In) \epsilon \sim (0, \sigma^2 I_n) . The OLS estimator β^=(XX)1XY \hat{\beta} = (X'X)^{-1}X'Y has a variance-covariance matrix given by:
Var(β^)=E[(β^β)(β^β)]=σ2(XX)1 \begin{aligned} \text{Var}(\hat{\beta}) = E[(\hat{\beta} - \beta)(\hat{\beta} - \beta)'] = \sigma^2 (X'X)^{-1} \end{aligned}

Analytical Intuition.

Imagine the OLS estimator β^ \hat{\beta} as a camera lens attempting to focus on the true parameter β \beta through the 'fog' of random noise ϵ \epsilon . The matrix XX X'X acts as our 'information manifold'; it encapsulates how much data we have and how well-distributed it is across the features. If the features in X X are highly correlated, the matrix XX X'X becomes near-singular, effectively shrinking the determinant and blowing up the variance—like trying to sharpen a blurred image with a lens that has too little surface area. The term σ2 \sigma^2 represents the intrinsic 'fuzziness' or volatility of the underlying process. Thus, the variance of our estimate is a tug-of-war between the inherent noise of the universe σ2 \sigma^2 and the strength of our data configuration (XX)1 (X'X)^{-1} . A well-conditioned X X spreads the variance out, ensuring that β^ \hat{\beta} remains tightly clustered around β \beta , providing the stability required for rigorous statistical inference.
CAUTION

Institutional Warning.

Students frequently conflate the variance of the residuals σ^2 \hat{\sigma}^2 with the variance of the parameters Var(β^) \text{Var}(\hat{\beta}) . Remember: σ^2 \hat{\sigma}^2 measures noise in data, while Var(β^) \text{Var}(\hat{\beta}) measures uncertainty in our estimated coefficients.

Academic Inquiries.

01

What happens if X is not full rank?

If X X is not full rank, XX X'X is singular and non-invertible. This signifies perfect multicollinearity, meaning the parameters are not uniquely identifiable.

02

Does this derivation assume normality of errors?

No. The variance-covariance derivation requires only the Gauss-Markov assumptions (constant variance and uncorrelated errors); normality is only required for exact finite-sample inference.

03

Why is (XX)1 (X'X)^{-1} called the information matrix?

In the context of Likelihood theory, XX/σ2 X'X/\sigma^2 is the Fisher Information matrix. Its inverse is the Cramer-Rao lower bound, representing the minimum possible variance for an unbiased estimator.

Standardized References.

  • Definitive Institutional SourceGreene, W. H., Econometric Analysis.

Institutional Citation

Reference this proof in your academic research or publications.

NICEFA Visual Mathematics. (2026). Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹: Visual Proof & Intuition. Retrieved from https://www.nicefa.org/library/general-linear-models-/derivation-of-the-variance-covariance-matrix-of-the-ols-estimator--var----------x-x---

Dominate the Logic.

"Abstract theory is just a movement we haven't seen yet."