Decomposition of Total Sum of Squares: SST = SSR + SSE and its Implications for R²
Unpack the core decomposition of total sum of squares (SST=SSR+SSE) in linear regression, its geometric intuition, and the implications for R² for BSc Math/Stats students.
Visualizing...
Our institutional research engineers are currently mapping the formal proof for Decomposition of Total Sum of Squares: SST = SSR + SSE and its Implications for R².
Apply for Institutional Early Access →The Formal Theorem
Analytical Intuition.
Institutional Warning.
Students often misunderstand the origin of the cross-product term's disappearance, failing to connect it to OLS properties and the geometric orthogonality. They also sometimes misinterpret as a definitive measure of model validity or predictive power, rather than solely the proportion of variance explained by the model.
Academic Inquiries.
Why does the cross-product term vanish in the decomposition?
This vanishing is a direct consequence of the Ordinary Least Squares (OLS) estimation procedure. When an intercept term is included in the model, OLS ensures that the sum of the residuals is zero, i.e., . Additionally, the OLS residuals are orthogonal to the predicted values , meaning . Combining these, the cross-product term simplifies to .
Is always between 0 and 1?
For standard OLS regression models that include an intercept, is always between 0 and 1. This is because , , and are sums of squares and are thus non-negative. Moreover, cannot exceed because it represents the explained portion of the total variation. However, if a model omits the intercept term, or if non-OLS estimation methods are used, the property that the cross-product term vanishes might not hold, and could theoretically be negative (or greater than 1, depending on the definition used), though this indicates a very poor model fit.
What is the relationship between and the Pearson correlation coefficient ?
For a simple linear regression model (with only one independent variable), the coefficient of determination is equal to the square of the Pearson product-moment correlation coefficient between the independent variable and the dependent variable . That is, . In multiple linear regression, is defined as the square of the multiple correlation coefficient, which is the Pearson correlation between the observed values and the predicted values , i.e., .
Does a high imply that a model is a good predictor or that the independent variables cause the dependent variable?
No, not necessarily. A high primarily indicates that a large proportion of the variance in the dependent variable is explained by the independent variables within the sample data, suggesting a good *fit*. However, it does not imply causation; correlation is not causation. A high also doesn't guarantee predictive accuracy on new data (the model could be overfit) or that the model's underlying assumptions are met. Other diagnostic checks, such as residual analysis, out-of-sample validation, and theoretical justification, are essential for assessing a model's overall quality and reliability.
Standardized References.
- Definitive Institutional SourceMontgomery, Douglas C., Peck, Elizabeth A., and Vining, G. Geoffrey. Introduction to Linear Regression Analysis.
Related Proofs Cluster.
The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions
Master the matrix formulation of the General Linear Model, \( Y = X\beta + \epsilon \), and its fundamental assumptions. Rigorous yet intuitive content for BSc Math/Stats students.
Derivation of the Ordinary Least Squares (OLS) Estimator: β̂ = (X'X)⁻¹X'Y
Master the OLS estimator derivation: \( \hat{\beta} = (X'X)^{-1}X'Y \). Explore the geometric orthogonality, matrix calculus, and Gauss-Markov foundations.
Proof of Unbiasedness of the OLS Estimator: E(β̂) = β
Master the rigorous proof of OLS estimator unbiasedness, \( E(\hat{\boldsymbol{\beta}}) = \boldsymbol{\beta} \). Understand critical assumptions, geometric intuition, and common pitfalls for robust linear modeling.
Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹
A rigorous derivation of the Variance-Covariance matrix for the OLS estimator, exploring the geometric impact of data configuration on statistical precision.
Institutional Citation
Reference this proof in your academic research or publications.
NICEFA Visual Mathematics. (2026). Decomposition of Total Sum of Squares: SST = SSR + SSE and its Implications for R²: Visual Proof & Intuition. Retrieved from https://www.nicefa.org/library/general-linear-models-/decomposition-of-total-sum-of-squares--sst---ssr---sse-and-its-implications-for-r-
Dominate the Logic.
"Abstract theory is just a movement we haven't seen yet."