Derivation and Properties of OLS Residuals: e = (I-H)Y, including E(e) = 0 and Var(e) = σ²(I-H)

Explore the derivation and properties of OLS residuals: \( e = (I-H)Y \), \( E(e) = 0 \), and \( Var(e) = \\sigma^2(I-H) \). Understand their geometric meaning and statistical implications for model diagnostics.

Visualizing...

Our institutional research engineers are currently mapping the formal proof for Derivation and Properties of OLS Residuals: e = (I-H)Y, including E(e) = 0 and Var(e) = σ²(I-H).

Apply for Institutional Early Access →

The Formal Theorem

Let Y Y be an ntimes1 n \\times 1 vector of observed responses, X X an ntimesp n \\times p design matrix with full column rank, and hatbeta=(XX)1XY \\hat{\\beta} = (X'X)^{-1}X'Y be the Ordinary Least Squares (OLS) estimator of the true coefficient vector β \beta . The OLS residual vector e e is defined as e=YXhatbeta e = Y - X\\hat{\\beta} . Under the Gauss-Markov assumptions, specifically E(varepsilon)=0 E(\\varepsilon) = 0 and Var(varepsilon)=sigma2I Var(\\varepsilon) = \\sigma^2 I where varepsilon=YXbeta \\varepsilon = Y - X\\beta , the OLS residuals e e can be expressed in terms of the Hat matrix H=X(XX)1X H = X(X'X)^{-1}X' , and their expected value and variance are given by:
\\begin{aligned} e &= (I-H)Y \\ E(e) &= 0 \\ Var(e) &= \\sigma^2(I-H) \\end{aligned}

Analytical Intuition.

Imagine a master architect designing a grand building. Y Y is the ambitious client's vision – a perfect blueprint. Xβ X\beta represents the architect's ideal, structurally sound design, accounting for all known variables. But reality, like ε \varepsilon , always introduces small imperfections – a slight breeze, a material's subtle warp. The architect's actual construction, Xβ^ X\hat{\beta} , is their best attempt, an OLS approximation.
Now, the OLS residuals e e are not the 'true' imperfections ε \varepsilon . Instead, they are the 'observable' discrepancies between the client's vision Y Y and the architect's executed design Xβ^ X\hat{\beta} . The Hat matrix H H acts as a sophisticated projector, mapping the client's full vision Y Y onto the "design space" Xβ^ X\hat{\beta} . So, HY HY is the architect's projected output. The residuals e=(IH)Y e = (I-H)Y are then the elements of the client's vision that the architect's design *failed to capture* – the unmodeled or unexplained variation, strictly orthogonal to the constructed design. Their expected value is zero because, on average, the OLS architect is unbiased. Their variance σ2(IH) \sigma^2(I-H) shrinks the original error variance σ2I \sigma^2 I because OLS residuals, being constrained by the model, are less variable than the true errors.
CAUTION

Institutional Warning.

Students often confuse OLS residuals e e with true errors ε \varepsilon . While e e estimates ε \varepsilon , e e are correlated and generally heteroscedastic, unlike the assumed properties of ε \varepsilon . Grasping e=(IH)varepsilon e = (I-H)\\varepsilon clarifies this distinction.

Academic Inquiries.

01

Why is E(e)=0 E(e) = 0 even though individual ei e_i are not necessarily zero?

E(e)=0 E(e) = 0 refers to the expected value of the *vector* of residuals. It means that, on average, across repeated samples, the OLS model is unbiased and does not systematically over- or underestimate the response. For any given sample, the residuals ei e_i will sum to zero (if an intercept is included in the model), but they are not individually expected to be zero. This condition E(e)=0 E(e)=0 is a direct result of the Gauss-Markov assumption E(varepsilon)=0 E(\\varepsilon)=0 and the properties of the OLS estimator.

02

If Var(ε)=sigma2I Var(\varepsilon) = \\sigma^2 I (homoscedasticity), why isn't Var(e)=sigma2I Var(e) = \\sigma^2 I ?

This is a crucial distinction. Var(e)=sigma2(IH) Var(e) = \\sigma^2(I-H) because e e are *not* the true errors ε \varepsilon . Instead, e=(IH)varepsilon e = (I-H)\\varepsilon . Applying the variance operator, Var(e)=(IH)Var(varepsilon)(IH)=(IH)(sigma2I)(IH)=sigma2(IH) Var(e) = (I-H)Var(\\varepsilon)(I-H)' = (I-H)(\\sigma^2 I)(I-H) = \\sigma^2(I-H) . Since (IH) (I-H) is not the identity matrix, Var(e) Var(e) is not sigma2I \\sigma^2 I . This implies that Var(ei)=sigma2(1hii) Var(e_i) = \\sigma^2(1-h_{ii}) (heteroscedasticity for residuals) and Cov(ei,ej)=sigma2hij Cov(e_i, e_j) = -\\sigma^2 h_{ij} (correlation among residuals), even if the true errors are homoscedastic and independent.

03

What is the significance of the Hat matrix H H being idempotent and symmetric in the context of residuals?

The idempotency H2=H H^2 = H and symmetry H=H H' = H are fundamental. They imply that (IH) (I-H) is also idempotent and symmetric. These properties are critical for simplifying Var(e) Var(e) : Var(e)=sigma2(IH)(IH)=sigma2(IH)(IH)=sigma2(IH) Var(e) = \\sigma^2(I-H)(I-H)' = \\sigma^2(I-H)(I-H) = \\sigma^2(I-H) . Geometrically, they signify that H H and (IH) (I-H) are orthogonal projection matrices onto complementary subspaces. This algebraic elegance underpins the statistical properties of residuals, particularly their covariance structure and degrees of freedom.

04

Can e e ever be equal to ε \varepsilon ?

In a specific, theoretical scenario where Y Y is perfectly observed and X X perfectly explains Y Y such that H H is a zero matrix (which is impossible if X X has columns), or more practically, if X X is an identity matrix (meaning predictors are just observations themselves), then H H would simplify dramatically. However, in the standard OLS setup, H H is a projection matrix, and e=(IH)Y=(IH)(Xbeta+varepsilon)=(IH)Xbeta+(IH)varepsilon=(XbetaHXbeta)+(IH)varepsilon=0+(IH)varepsilon e = (I-H)Y = (I-H)(X\\beta + \\varepsilon) = (I-H)X\\beta + (I-H)\\varepsilon = (X\\beta - HX\\beta) + (I-H)\\varepsilon = 0 + (I-H)\\varepsilon . So, e=(IH)varepsilon e = (I-H)\\varepsilon . They are only equal if H=0 H = 0 , which means X X would have to be empty or effectively zero, nullifying the regression. Thus, e e and ε \varepsilon are almost never identical; e e is always a projected version of ε \varepsilon .

Standardized References.

  • Definitive Institutional SourceSeber, G. A. F., & Lee, A. J. Linear Regression Analysis.

Institutional Citation

Reference this proof in your academic research or publications.

NICEFA Visual Mathematics. (2026). Derivation and Properties of OLS Residuals: e = (I-H)Y, including E(e) = 0 and Var(e) = σ²(I-H): Visual Proof & Intuition. Retrieved from https://www.nicefa.org/library/general-linear-models-/derivation-and-properties-of-ols-residuals--e----i-h-y--including-e-e----0-and-var-e-------i-h-

Dominate the Logic.

"Abstract theory is just a movement we haven't seen yet."