Proof of the Distribution of Sums of Squares under Normality Assumptions (Chi-squared and F distributions)

Master the rigorous proof of Chi-squared and F distributions for sums of squares under normality assumptions, crucial for General Linear Models and statistical inference.

Visualizing...

Our institutional research engineers are currently mapping the formal proof for Proof of the Distribution of Sums of Squares under Normality Assumptions (Chi-squared and F distributions).

Apply for Institutional Early Access →

The Formal Theorem

Let Y=(Y1,,Yn) Y = (Y_1, \dots, Y_n)' be a vector of n n observations, assumed to follow a multivariate normal distribution YNn(Xβ,σ2In) Y \sim N_n(X\beta, \sigma^2 I_n) , where X X is an n×p n \times p design matrix of full column rank p p (with n>p n > p ), β \beta is a p×1 p \times 1 vector of unknown parameters, and σ2 \sigma^2 is the unknown error variance. Let β^=(XX)1XY \hat{\beta} = (X'X)^{-1}X'Y be the Ordinary Least Squares (OLS) estimator for β \beta , and PX=X(XX)1X P_X = X(X'X)^{-1}X' be the projection matrix onto the column space of X X . The fitted values are Y^=PXY \hat{Y} = P_X Y , and the residuals are e=(IPX)Y e = (I - P_X)Y . \n\nFurthermore, consider two nested linear models: a 'full' model with pF p_F parameters resulting in RSSF RSS_F (Residual Sum of Squares) and a 'reduced' model with pR p_R parameters (where the column space of XR X_R is a subspace of the column space of XF X_F ) resulting in RSSR RSS_R . The degrees of freedom for the full model are dfF=npF df_F = n - p_F and for the reduced model are dfR=npR df_R = n - p_R . \n\nThen, the following distributional results hold:\n\n1. **Residual Sum of Squares (RSS)**: The scaled residual sum of squares follows a Chi-squared distribution:\n
RSSσ2=YY^2σ2=e2σ2=(IPX)Y2σ2χ2(np) \frac{RSS}{\sigma^2} = \frac{\|Y - \hat{Y}\|^2}{\sigma^2} = \frac{\|e\|^2}{\sigma^2} = \frac{\|(I - P_X)Y\|^2}{\sigma^2} \sim \chi^2(n-p)
\n\n2. **Independence of Sums of Squares**: Under the general conditions of Cochran's Theorem, for a linear model under normality, the residual sum of squares RSS RSS is statistically independent of the estimated regression coefficients β^ \hat{\beta} . Consequently, any quadratic form depending only on β^ \hat{\beta} , such as the sum of squares due to regression, SSR=Y^Yˉ12 SSR = \|\hat{Y} - \bar{Y}\mathbf{1}\|^2 (for models with an intercept), is independent of RSS RSS . More generally, for a decomposition of total sum of squares into quadratic forms YAjY Y'A_j Y , their independence is guaranteed if the projection matrices Aj A_j are orthogonal, i.e., AiAj=0 A_i A_j = 0 for ij i \neq j .\n\n3. **F-distribution for Model Comparison**: To test the null hypothesis H0: H_0: the reduced model is sufficient (i.e., the additional pFpR p_F - p_R parameters in the full model are zero, or equivalently, the mean vector lies in the column space of XR X_R ), the F-statistic is given by:\n
\labeleq:fstatF=(RSSRRSSF)/(pFpR)RSSF/(npF)=Mean Square due to Added ParametersMean Square Error for Full Model\begin{aligned} \label{eq:f-stat} F &= \frac{(RSS_R - RSS_F) / (p_F - p_R)}{RSS_F / (n - p_F)} \\ &= \frac{\text{Mean Square due to Added Parameters}}{\text{Mean Square Error for Full Model}} \end{aligned}
\n Under H0 H_0 , this statistic follows an F-distribution with (pFpR) (p_F - p_R) numerator degrees of freedom and (npF) (n - p_F) denominator degrees of freedom:\n
FF(pFpR,npF) F \sim F(p_F - p_R, n - p_F)

Analytical Intuition.

Imagine our data as stars scattered across a cosmic canvas. When we assume normality, these stars form a magnificent, symmetrical nebula. Our task isn't just to observe the nebula; it's to understand its underlying structure. The 'Total Sum of Squares' is like measuring the overall brilliance and spread of this star field. But within this brilliance, some patterns are caused by gravitational forces (our model's predictors, Xβ X\beta ), and some are just the inherent cosmic background noise (our errors, ϵ \epsilon ). \n\nWhen we project our star data onto the model's 'gravitational field' (the column space of X X ), we get the 'Regression Sum of Squares'. The remaining 'light' that doesn't align with the model is the 'Residual Sum of Squares'. Squaring these deviations and scaling them by σ2 \sigma^2 transforms them into a 'Chi-squared dance', a specific pattern of random variability. \n\nThe F-distribution emerges when we ask a deeper question: 'Is adding more gravitational forces (more predictors) significantly improving our view of the nebula?' It's a ratio, like comparing the 'extra brilliance explained by new forces' to the 'baseline cosmic noise'. If this ratio is large, it means the new forces are genuinely shaping the nebula, not just adding more fuzz.
CAUTION

Institutional Warning.

Students frequently confuse the geometric interpretation of 'degrees of freedom' as the dimensionality of the error space (after parameter estimation) with a mere arithmetic subtraction. They also struggle to grasp the independence of the numerator and denominator sums of squares for the F-statistic, which stems from the orthogonality of corresponding projection operators under normality.

Academic Inquiries.

01

Why is the normality assumption so critical for these distributions?

The Chi-squared distribution is defined as the sum of squared standard normal random variables. If the underlying errors ϵi \epsilon_i are not normal, then ϵi2/σ2 \epsilon_i^2/\sigma^2 will not be Chi-squared, and consequently, sums of these will not follow Chi-squared or F distributions. While asymptotic results exist for large samples, for finite samples, exact inference relies heavily on normality.

02

How does Cochran's Theorem relate to the independence of sums of squares?

Cochran's Theorem provides a powerful formal proof for the independence of various sums of squares. It states that if a total sum of squares of standard normal variables can be decomposed into several quadratic forms, and the ranks of the matrices defining these quadratic forms sum up to the total number of variables, then these quadratic forms are independently Chi-squared distributed. In GLMs, the projection matrices for RSS RSS and SSR SSR satisfy the conditions of Cochran's Theorem, guaranteeing their independence.

03

What happens if the model is misspecified (e.g., non-linear relationships, omitted variables)?

Model misspecification can invalidate the assumption that YNn(Xβ,σ2In) Y \sim N_n(X\beta, \sigma^2 I_n) . If the linear structure is incorrect, Xβ X\beta will not accurately represent the mean, and the residuals will not represent pure noise. This can lead to biased estimators, inconsistent variance estimates, and most importantly for this topic, the distributions of the sums of squares will no longer be Chi-squared or F, even if the errors are otherwise normal.

04

Why do the degrees of freedom change (e.g., from n n to np n-p )?

Degrees of freedom represent the number of independent pieces of information available to estimate a parameter or contribute to a sum of squares. When we estimate p p parameters (e.g., β \beta coefficients) from n n observations, we effectively 'lose' p p degrees of freedom. For instance, the residual sum of squares uses n n observations to calculate residuals, but these residuals are constrained by the p p estimated parameters, leaving np n-p independent pieces of information for the error variance estimate.

05

Can these distributions be applied when errors have unequal variances (heteroscedasticity)?

No, not directly. The theorem relies on the assumption of σ2In \sigma^2 I_n for the covariance matrix of Y Y , meaning homoscedasticity (equal variances) and independence. If there is heteroscedasticity, the quadratic forms Y(IPX)Y Y'(I-P_X)Y and YPXY Y'P_X Y will not necessarily follow Chi-squared distributions when scaled by a single σ2 \sigma^2 , and the independence property derived from orthogonal projections might also be affected. Specialized methods (e.g., Weighted Least Squares, robust standard errors) are needed.

Standardized References.

  • Definitive Institutional SourceSeber, G. A. F., & Lee, A. J. (2003). Linear Regression Analysis. John Wiley & Sons. | Rencher, A. C., & Schaalje, G. B. (2008). Linear Models in Statistics. John Wiley & Sons.

Institutional Citation

Reference this proof in your academic research or publications.

NICEFA Visual Mathematics. (2026). Proof of the Distribution of Sums of Squares under Normality Assumptions (Chi-squared and F distributions): Visual Proof & Intuition. Retrieved from https://www.nicefa.org/library/general-linear-models-/proof-of-the-distribution-of-sums-of-squares-under-normality-assumptions--chi-squared-and-f-distributions-

Dominate the Logic.

"Abstract theory is just a movement we haven't seen yet."