Q: What happens if the design matrix \$ \\mathbf{X} \$ does not have full column rank, making \$ (\\mathbf{X}^T\\mathbf{X})^{-1} \$ undefined?

If \$ \\mathbf{X} \$ does not have full column rank, \$ \\mathbf{X}^T\\mathbf{X} \$ is singular, and its inverse does not exist in the traditional sense. In such cases, a generalized inverse (e.g., the Moore-Penrose pseudoinverse) can be used to define a generalized Hat Matrix, \$ \\mathbf{H} = \\mathbf{X}(\\mathbf{X}^T\\mathbf{X})^-\\mathbf{X}^T \$. While this still provides a projection, the properties of uniqueness and some interpretations might need careful re-evaluation.

Q: How does the Hat Matrix relate to the residuals of the model?

The vector of residuals is \$ \\mathbf{e} = \\mathbf{y} - \\\\hat{\\mathbf{y}} \$. Since \$ \\\\hat{\\mathbf{y}} = \\mathbf{H}\\mathbf{y} \$, we can write \$ \\mathbf{e} = \\mathbf{y} - \\mathbf{H}\\mathbf{y} = (\\mathbf{I} - \\mathbf{H})\\mathbf{y} \$. The matrix \$ \\mathbf{M} = \\mathbf{I} - \\mathbf{H} \$ is also an idempotent and symmetric projection matrix, often called the "Residual Maker Matrix." It projects \$ \\mathbf{y} \$ onto the orthogonal complement of the column space of \$ \\mathbf{X} \$, which is the space where the residuals reside.

Q: What are the typical bounds for the diagonal elements \$ h_{ii} \$ of the Hat Matrix, and what do they signify?

For standard OLS models, the diagonal elements \$ h_{ii} \$ always lie between 0 and 1, inclusive: \$ 0 \\\\le h_{ii} \\\\le 1 \$. A value of \$ h_{ii} = 0 \$ implies that the \$ i \$-th observation has no influence on its own fitted value (and thus little influence on the regression line), which typically happens if the \$ i \$-th row of \$ \\mathbf{X} \$ is \$ \\mathbf{0} \$ (though this is rare in practice). A value of \$ h_{ii} = 1 \$ implies that the \$ i \$-th observation perfectly determines its own fitted value, effectively forcing the regression surface to pass exactly through \$ (\\mathbf{x}_i^T, y_i) \$. This usually indicates an extreme outlier in the \$ \\mathbf{X} \$ space or a model that overfits a single point, potentially due to having only one unique observation for a particular covariate pattern.

Q: Can \$ h_{ii} \$ values be used to identify influential points?

High \$ h_{ii} \$ values indicate observations with high "leverage," meaning they have unusual \$ \\mathbf{X} \$-values compared to other observations and thus exert a disproportionate pull on the fitted regression line. However, high leverage alone does not mean an observation is influential. An influential point is one that significantly changes the estimated regression coefficients when removed. A high-leverage point becomes influential if its \$ y_i \$ value is also unusual relative to the trend established by other data points. Measures like Cook's Distance combine leverage and residual size to better identify influential points.

Question 1

What happens if the design matrix $ \mathbf{X} $ does not have full column rank, making $ (\mathbf{X}^T\mathbf{X})^{-1} $ undefined?

Accepted Answer

If $ \mathbf{X} $ does not have full column rank, $ \mathbf{X}^T\mathbf{X} $ is singular, and its inverse does not exist in the traditional sense. In such cases, a generalized inverse (e.g., the Moore-Penrose pseudoinverse) can be used to define a generalized Hat Matrix, $ \mathbf{H} = \mathbf{X}(\mathbf{X}^T\mathbf{X})^-\mathbf{X}^T $. While this still provides a projection, the properties of uniqueness and some interpretations might need careful re-evaluation.

Question 2

How does the Hat Matrix relate to the residuals of the model?

Accepted Answer

The vector of residuals is $ \mathbf{e} = \mathbf{y} - \\hat{\mathbf{y}} $. Since $ \\hat{\mathbf{y}} = \mathbf{H}\mathbf{y} $, we can write $ \mathbf{e} = \mathbf{y} - \mathbf{H}\mathbf{y} = (\mathbf{I} - \mathbf{H})\mathbf{y} $. The matrix $ \mathbf{M} = \mathbf{I} - \mathbf{H} $ is also an idempotent and symmetric projection matrix, often called the "Residual Maker Matrix." It projects $ \mathbf{y} $ onto the orthogonal complement of the column space of $ \mathbf{X} $, which is the space where the residuals reside.

Question 3

What are the typical bounds for the diagonal elements $ h_{ii} $ of the Hat Matrix, and what do they signify?

Accepted Answer

For standard OLS models, the diagonal elements $ h_{ii} $ always lie between 0 and 1, inclusive: $ 0 \\le h_{ii} \\le 1 $. A value of $ h_{ii} = 0 $ implies that the $ i $-th observation has no influence on its own fitted value (and thus little influence on the regression line), which typically happens if the $ i $-th row of $ \mathbf{X} $ is $ \mathbf{0} $ (though this is rare in practice). A value of $ h_{ii} = 1 $ implies that the $ i $-th observation perfectly determines its own fitted value, effectively forcing the regression surface to pass exactly through $ (\mathbf{x}_i^T, y_i) $. This usually indicates an extreme outlier in the $ \mathbf{X} $ space or a model that overfits a single point, potentially due to having only one unique observation for a particular covariate pattern.

Question 4

Can $ h_{ii} $ values be used to identify influential points?

Accepted Answer

High $ h_{ii} $ values indicate observations with high "leverage," meaning they have unusual $ \mathbf{X} $-values compared to other observations and thus exert a disproportionate pull on the fitted regression line. However, high leverage alone does not mean an observation is influential. An influential point is one that significantly changes the estimated regression coefficients when removed. A high-leverage point becomes influential if its $ y_i $ value is also unusual relative to the trend established by other data points. Measures like Cook's Distance combine leverage and residual size to better identify influential points.

Properties and Derivation of the Hat Matrix (H): Symmetry, Idempotence, and its Role in Leverage

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

What happens if the design matrix \ $\\mathbf{X} \$ does not have full column rank, making \ $(\\mathbf{X}^T\\mathbf{X})^{-1} \$ undefined?

How does the Hat Matrix relate to the residuals of the model?

What are the typical bounds for the diagonal elements \ $h_{ii} \$ of the Hat Matrix, and what do they signify?

Can \ $h_{ii} \$ values be used to identify influential points?

Standardized References.

The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions

Derivation of the Ordinary Least Squares (OLS) Estimator: β̂ = (X'X)⁻¹X'Y

Proof of Unbiasedness of the OLS Estimator: E(β̂) = β

Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹

Institutional Citation

Dominate the Logic.

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

What happens if the design matrix \ \\mathbf{X} \ does not have full column rank, making \ (\\mathbf{X}^T\\mathbf{X})^{-1} \ undefined?

How does the Hat Matrix relate to the residuals of the model?

What are the typical bounds for the diagonal elements \ h_{ii} \ of the Hat Matrix, and what do they signify?

Can \ h_{ii} \ values be used to identify influential points?

Standardized References.

Related Proofs Cluster.

The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions

Derivation of the Ordinary Least Squares (OLS) Estimator: β̂ = (X'X)⁻¹X'Y

Proof of Unbiasedness of the OLS Estimator: E(β̂) = β

Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹

Institutional Citation

Dominate the Logic.

What happens if the design matrix \ $\\mathbf{X} \$ does not have full column rank, making \ $(\\mathbf{X}^T\\mathbf{X})^{-1} \$ undefined?

What are the typical bounds for the diagonal elements \ $h_{ii} \$ of the Hat Matrix, and what do they signify?

Can \ $h_{ii} \$ values be used to identify influential points?