Q: Why is $ E(e) = 0 $ even though individual $ e_i $ are not necessarily zero?

$ E(e) = 0 $ refers to the expected value of the *vector* of residuals. It means that, on average, across repeated samples, the OLS model is unbiased and does not systematically over- or underestimate the response. For any given sample, the residuals $ e_i $ will sum to zero (if an intercept is included in the model), but they are not individually expected to be zero. This condition $ E(e)=0 $ is a direct result of the Gauss-Markov assumption $ E(\\varepsilon)=0 $ and the properties of the OLS estimator.

Q: If $ Var(\varepsilon) = \\sigma^2 I $ (homoscedasticity), why isn't $ Var(e) = \\sigma^2 I $?

This is a crucial distinction. $ Var(e) = \\sigma^2(I-H) $ because $ e $ are *not* the true errors $ \varepsilon $. Instead, $ e = (I-H)\\varepsilon $. Applying the variance operator, $ Var(e) = (I-H)Var(\\varepsilon)(I-H)' = (I-H)(\\sigma^2 I)(I-H) = \\sigma^2(I-H) $. Since $ (I-H) $ is not the identity matrix, $ Var(e) $ is not $ \\sigma^2 I $. This implies that $ Var(e_i) = \\sigma^2(1-h_{ii}) $ (heteroscedasticity for residuals) and $ Cov(e_i, e_j) = -\\sigma^2 h_{ij} $ (correlation among residuals), even if the true errors are homoscedastic and independent.

Q: What is the significance of the Hat matrix $ H $ being idempotent and symmetric in the context of residuals?

The idempotency $ H^2 = H $ and symmetry $ H' = H $ are fundamental. They imply that $ (I-H) $ is also idempotent and symmetric. These properties are critical for simplifying $ Var(e) $: $ Var(e) = \\sigma^2(I-H)(I-H)' = \\sigma^2(I-H)(I-H) = \\sigma^2(I-H) $. Geometrically, they signify that $ H $ and $ (I-H) $ are orthogonal projection matrices onto complementary subspaces. This algebraic elegance underpins the statistical properties of residuals, particularly their covariance structure and degrees of freedom.

Q: Can $ e $ ever be equal to $ \varepsilon $?

In a specific, theoretical scenario where $ Y $ is perfectly observed and $ X $ perfectly explains $ Y $ such that $ H $ is a zero matrix (which is impossible if $ X $ has columns), or more practically, if $ X $ is an identity matrix (meaning predictors are just observations themselves), then $ H $ would simplify dramatically. However, in the standard OLS setup, $ H $ is a projection matrix, and $ e = (I-H)Y = (I-H)(X\\beta + \\varepsilon) = (I-H)X\\beta + (I-H)\\varepsilon = (X\\beta - HX\\beta) + (I-H)\\varepsilon = 0 + (I-H)\\varepsilon $. So, $ e = (I-H)\\varepsilon $. They are only equal if $ H = 0 $, which means $ X $ would have to be empty or effectively zero, nullifying the regression. Thus, $ e $ and $ \varepsilon $ are almost never identical; $ e $ is always a projected version of $ \varepsilon $.

Question 1

Why is $ E(e) = 0 $ even though individual $ e_i $ are not necessarily zero?

Accepted Answer

$ E(e) = 0 $ refers to the expected value of the *vector* of residuals. It means that, on average, across repeated samples, the OLS model is unbiased and does not systematically over- or underestimate the response. For any given sample, the residuals $ e_i $ will sum to zero (if an intercept is included in the model), but they are not individually expected to be zero. This condition $ E(e)=0 $ is a direct result of the Gauss-Markov assumption $ E(\varepsilon)=0 $ and the properties of the OLS estimator.

Question 2

If $ Var(\varepsilon) = \sigma^2 I $ (homoscedasticity), why isn't $ Var(e) = \sigma^2 I $?

Accepted Answer

This is a crucial distinction. $ Var(e) = \sigma^2(I-H) $ because $ e $ are *not* the true errors $ \varepsilon $. Instead, $ e = (I-H)\varepsilon $. Applying the variance operator, $ Var(e) = (I-H)Var(\varepsilon)(I-H)' = (I-H)(\sigma^2 I)(I-H) = \sigma^2(I-H) $. Since $ (I-H) $ is not the identity matrix, $ Var(e) $ is not $ \sigma^2 I $. This implies that $ Var(e_i) = \sigma^2(1-h_{ii}) $ (heteroscedasticity for residuals) and $ Cov(e_i, e_j) = -\sigma^2 h_{ij} $ (correlation among residuals), even if the true errors are homoscedastic and independent.

Question 3

What is the significance of the Hat matrix $ H $ being idempotent and symmetric in the context of residuals?

Accepted Answer

The idempotency $ H^2 = H $ and symmetry $ H' = H $ are fundamental. They imply that $ (I-H) $ is also idempotent and symmetric. These properties are critical for simplifying $ Var(e) $: $ Var(e) = \sigma^2(I-H)(I-H)' = \sigma^2(I-H)(I-H) = \sigma^2(I-H) $. Geometrically, they signify that $ H $ and $ (I-H) $ are orthogonal projection matrices onto complementary subspaces. This algebraic elegance underpins the statistical properties of residuals, particularly their covariance structure and degrees of freedom.

Question 4

Can $ e $ ever be equal to $ \varepsilon $?

Accepted Answer

In a specific, theoretical scenario where $ Y $ is perfectly observed and $ X $ perfectly explains $ Y $ such that $ H $ is a zero matrix (which is impossible if $ X $ has columns), or more practically, if $ X $ is an identity matrix (meaning predictors are just observations themselves), then $ H $ would simplify dramatically. However, in the standard OLS setup, $ H $ is a projection matrix, and $ e = (I-H)Y = (I-H)(X\beta + \varepsilon) = (I-H)X\beta + (I-H)\varepsilon = (X\beta - HX\beta) + (I-H)\varepsilon = 0 + (I-H)\varepsilon $. So, $ e = (I-H)\varepsilon $. They are only equal if $ H = 0 $, which means $ X $ would have to be empty or effectively zero, nullifying the regression. Thus, $ e $ and $ \varepsilon $ are almost never identical; $ e $ is always a projected version of $ \varepsilon $.

Derivation and Properties of OLS Residuals: e = (I-H)Y, including E(e) = 0 and Var(e) = σ²(I-H)

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

Why is $E(e) = 0$ even though individual $e_i$ are not necessarily zero?

If $Var(\varepsilon) = \\sigma^2 I$ (homoscedasticity), why isn't $Var(e) = \\sigma^2 I$ ?

What is the significance of the Hat matrix $H$ being idempotent and symmetric in the context of residuals?

Can $e$ ever be equal to $\varepsilon$ ?

Standardized References.

The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions

Derivation of the Ordinary Least Squares (OLS) Estimator: β̂ = (X'X)⁻¹X'Y

Proof of Unbiasedness of the OLS Estimator: E(β̂) = β

Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹

Institutional Citation

Dominate the Logic.