Decomposition of Total Sum of Squares: SST = SSR + SSE and its Implications for R²

Unpack the core decomposition of total sum of squares (SST=SSR+SSE) in linear regression, its geometric intuition, and the implications for R² for BSc Math/Stats students.

Visualizing...

Our institutional research engineers are currently mapping the formal proof for Decomposition of Total Sum of Squares: SST = SSR + SSE and its Implications for R².

Apply for Institutional Early Access →

The Formal Theorem

Given a simple linear regression model Yi=β0+β1Xi+ϵi Y_i = \beta_0 + \beta_1 X_i + \epsilon_i and its ordinary least squares (OLS) estimate Y^i=β^0+β^1Xi \hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i , the total variation in the dependent variable Y Y can be decomposed into the variation explained by the model and the unexplained variation. Specifically, for n n observations, let Yi Y_i be the i i -th observed value, Y^i \hat{Y}_i be the i i -th predicted value, and Yˉ=1ni=1nYi \bar{Y} = \frac{1}{n} \sum_{i=1}^{n} Y_i be the mean of the observed values. The decomposition is stated as:
SST=SSR+SSE \text{SST} = \text{SSR} + \text{SSE}
where\begin{aligned} \text{SST} &= \sum_{i=1}^{n} (Y_i - \bar{Y})^2 && \text{(Total Sum of Squares)} \\ \text{SSR} &= \sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2 && \text{(Sum of Squares Regression)} \\ \text{SSE} &= \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 && \text{(Sum of Squares Error)} \end{aligned}This decomposition leads directly to the coefficient of determination, R2 R^2 , defined as R2=SSRSST=1SSESST R^2 = \frac{\text{SSR}}{\text{SST}} = 1 - \frac{\text{SSE}}{\text{SST}} .

Analytical Intuition.

Imagine a cosmic battlefield where data points are stars scattered across the night sky. The 'mean' Yˉ \bar{Y} is the gravitational center, the true north of our observations. SST \text{SST} represents the total cosmic energy, the sum of all squared deviations of each star Yi Y_i from this central pull Yˉ \bar{Y} . Now, a powerful predictive model, our 'celestial engine', attempts to explain these stellar positions, placing its own theoretical positions Y^i \hat{Y}_i . The SSR \text{SSR} is the energy our engine successfully harnesses – the systematic drift of its predicted positions Y^i \hat{Y}_i away from the central Yˉ \bar{Y} . Finally, SSE \text{SSE} is the leftover, unexplained chaos: the inherent 'wobble' of each star Yi Y_i around its engine's predicted position Y^i \hat{Y}_i . This fundamental law, SST=SSR+SSE \text{SST} = \text{SSR} + \text{SSE} , reveals that total cosmic variation is simply the sum of the engine's explained patterns and the universe's unexplained randomness. R2 R^2 is our engine's efficiency rating: the proportion of total cosmic energy that our engine successfully accounts for, a testament to its explanatory power.
CAUTION

Institutional Warning.

Students often misunderstand the origin of the cross-product term's disappearance, failing to connect it to OLS properties and the geometric orthogonality. They also sometimes misinterpret R2 R^2 as a definitive measure of model validity or predictive power, rather than solely the proportion of variance explained by the model.

Academic Inquiries.

01

Why does the cross-product term 2(YiY^i)(Y^iYˉ) 2\sum (Y_i - \hat{Y}_i)(\hat{Y}_i - \bar{Y}) vanish in the decomposition?

This vanishing is a direct consequence of the Ordinary Least Squares (OLS) estimation procedure. When an intercept term is included in the model, OLS ensures that the sum of the residuals (YiY^i) (Y_i - \hat{Y}_i) is zero, i.e., (YiY^i)=0 \sum (Y_i - \hat{Y}_i) = 0 . Additionally, the OLS residuals are orthogonal to the predicted values Y^i \hat{Y}_i , meaning (YiY^i)Y^i=0 \sum (Y_i - \hat{Y}_i) \hat{Y}_i = 0 . Combining these, the cross-product term simplifies to (YiY^i)(Y^iYˉ)=(YiY^i)Y^iYˉ(YiY^i)=0Yˉ(0)=0 \sum (Y_i - \hat{Y}_i)(\hat{Y}_i - \bar{Y}) = \sum (Y_i - \hat{Y}_i)\hat{Y}_i - \bar{Y}\sum (Y_i - \hat{Y}_i) = 0 - \bar{Y}(0) = 0 .

02

Is R2 R^2 always between 0 and 1?

For standard OLS regression models that include an intercept, R2 R^2 is always between 0 and 1. This is because SSR \text{SSR} , SSE \text{SSE} , and SST \text{SST} are sums of squares and are thus non-negative. Moreover, SSR \text{SSR} cannot exceed SST \text{SST} because it represents the explained portion of the total variation. However, if a model omits the intercept term, or if non-OLS estimation methods are used, the property that the cross-product term vanishes might not hold, and R2 R^2 could theoretically be negative (or greater than 1, depending on the definition used), though this indicates a very poor model fit.

03

What is the relationship between R2 R^2 and the Pearson correlation coefficient r r ?

For a simple linear regression model (with only one independent variable), the coefficient of determination R2 R^2 is equal to the square of the Pearson product-moment correlation coefficient r r between the independent variable X X and the dependent variable Y Y . That is, R2=(corr(X,Y))2 R^2 = (\text{corr}(X, Y))^2 . In multiple linear regression, R2 R^2 is defined as the square of the multiple correlation coefficient, which is the Pearson correlation between the observed values Yi Y_i and the predicted values Y^i \hat{Y}_i , i.e., R2=(corr(Y,Y^))2 R^2 = (\text{corr}(Y, \hat{Y}))^2 .

04

Does a high R2 R^2 imply that a model is a good predictor or that the independent variables cause the dependent variable?

No, not necessarily. A high R2 R^2 primarily indicates that a large proportion of the variance in the dependent variable Y Y is explained by the independent variables X X within the sample data, suggesting a good *fit*. However, it does not imply causation; correlation is not causation. A high R2 R^2 also doesn't guarantee predictive accuracy on new data (the model could be overfit) or that the model's underlying assumptions are met. Other diagnostic checks, such as residual analysis, out-of-sample validation, and theoretical justification, are essential for assessing a model's overall quality and reliability.

Standardized References.

  • Definitive Institutional SourceMontgomery, Douglas C., Peck, Elizabeth A., and Vining, G. Geoffrey. Introduction to Linear Regression Analysis.

Institutional Citation

Reference this proof in your academic research or publications.

NICEFA Visual Mathematics. (2026). Decomposition of Total Sum of Squares: SST = SSR + SSE and its Implications for R²: Visual Proof & Intuition. Retrieved from https://www.nicefa.org/library/general-linear-models-/decomposition-of-total-sum-of-squares--sst---ssr---sse-and-its-implications-for-r-

Dominate the Logic.

"Abstract theory is just a movement we haven't seen yet."