Proof of the Distribution of Sums of Squares under Normality Assumptions (Chi-squared and F distributions)

Q: What happens if the model is misspecified (e.g., non-linear relationships, omitted variables)?

Model misspecification can invalidate the assumption that $ Y \sim N_n(X\beta, \sigma^2 I_n) $. If the linear structure is incorrect, $ X\beta $ will not accurately represent the mean, and the residuals will not represent pure noise. This can lead to biased estimators, inconsistent variance estimates, and most importantly for this topic, the distributions of the sums of squares will no longer be Chi-squared or F, even if the errors are otherwise normal.

Q: Can these distributions be applied when errors have unequal variances (heteroscedasticity)?

No, not directly. The theorem relies on the assumption of $ \sigma^2 I_n $ for the covariance matrix of $ Y $, meaning homoscedasticity (equal variances) and independence. If there is heteroscedasticity, the quadratic forms $ Y'(I-P_X)Y $ and $ Y'P_X Y $ will not necessarily follow Chi-squared distributions when scaled by a single $ \sigma^2 $, and the independence property derived from orthogonal projections might also be affected. Specialized methods (e.g., Weighted Least Squares, robust standard errors) are needed.

Master the rigorous proof of Chi-squared and F distributions for sums of squares under normality assumptions, crucial for General Linear Models and statistical inference.

Visualizing...

Our institutional research engineers are currently mapping the formal proof for Proof of the Distribution of Sums of Squares under Normality Assumptions (Chi-squared and F distributions).

Apply for Institutional Early Access →

The Formal Theorem

Let

Y = (Y_1, \dots, Y_n)'

be a vector of

n

observations, assumed to follow a multivariate normal distribution

Y \sim N_n(X\beta, \sigma^2 I_n)

, where

X

is an

n \times p

design matrix of full column rank

p

(with

n > p

\beta

is a

p \times 1

vector of unknown parameters, and

\sigma^2

is the unknown error variance. Let

\hat{\beta} = (X'X)^{-1}X'Y

be the Ordinary Least Squares (OLS) estimator for

\beta

, and

P_X = X(X'X)^{-1}X'

be the projection matrix onto the column space of

X

. The fitted values are

\hat{Y} = P_X Y

, and the residuals are

e = (I - P_X)Y

. \n\nFurthermore, consider two nested linear models: a 'full' model with

p_F

parameters resulting in

RSS_F

(Residual Sum of Squares) and a 'reduced' model with

p_R

parameters (where the column space of

X_R

is a subspace of the column space of

X_F

) resulting in

RSS_R

. The degrees of freedom for the full model are

df_F = n - p_F

and for the reduced model are

df_R = n - p_R

. \n\nThen, the following distributional results hold:\n\n1. **Residual Sum of Squares (RSS)**: The scaled residual sum of squares follows a Chi-squared distribution:\n

\frac{RSS}{\sigma^2} = \frac{\|Y - \hat{Y}\|^2}{\sigma^2} = \frac{\|e\|^2}{\sigma^2} = \frac{\|(I - P_X)Y\|^2}{\sigma^2} \sim \chi^2(n-p)

\n\n2. **Independence of Sums of Squares**: Under the general conditions of Cochran's Theorem, for a linear model under normality, the residual sum of squares

RSS

is statistically independent of the estimated regression coefficients

\hat{\beta}

. Consequently, any quadratic form depending only on

\hat{\beta}

, such as the sum of squares due to regression,

SSR = \|\hat{Y} - \bar{Y}\mathbf{1}\|^2

(for models with an intercept), is independent of

RSS

. More generally, for a decomposition of total sum of squares into quadratic forms

Y'A_j Y

, their independence is guaranteed if the projection matrices

A_j

are orthogonal, i.e.,

A_i A_j = 0

for

i \neq j

.\n\n3. **F-distribution for Model Comparison**: To test the null hypothesis

H_0:

the reduced model is sufficient (i.e., the additional

p_F - p_R

parameters in the full model are zero, or equivalently, the mean vector lies in the column space of

X_R

), the F-statistic is given by:\n

\begin{aligned} \label{eq:f-stat} F &= \frac{(RSS_R - RSS_F) / (p_F - p_R)}{RSS_F / (n - p_F)} \\ &= \frac{\text{Mean Square due to Added Parameters}}{\text{Mean Square Error for Full Model}} \end{aligned}

\n Under

H_0

, this statistic follows an F-distribution with

(p_F - p_R)

numerator degrees of freedom and

(n - p_F)

denominator degrees of freedom:\n

F \sim F(p_F - p_R, n - p_F)

Analytical Intuition.

Imagine our data as stars scattered across a cosmic canvas. When we assume normality, these stars form a magnificent, symmetrical nebula. Our task isn't just to observe the nebula; it's to understand its underlying structure. The 'Total Sum of Squares' is like measuring the overall brilliance and spread of this star field. But within this brilliance, some patterns are caused by gravitational forces (our model's predictors,

X\beta

), and some are just the inherent cosmic background noise (our errors,

\epsilon

). \n\nWhen we project our star data onto the model's 'gravitational field' (the column space of

X

), we get the 'Regression Sum of Squares'. The remaining 'light' that doesn't align with the model is the 'Residual Sum of Squares'. Squaring these deviations and scaling them by

\sigma^2

transforms them into a 'Chi-squared dance', a specific pattern of random variability. \n\nThe F-distribution emerges when we ask a deeper question: 'Is adding more gravitational forces (more predictors) significantly improving our view of the nebula?' It's a ratio, like comparing the 'extra brilliance explained by new forces' to the 'baseline cosmic noise'. If this ratio is large, it means the new forces are genuinely shaping the nebula, not just adding more fuzz.

CAUTION

Institutional Warning.

Students frequently confuse the geometric interpretation of 'degrees of freedom' as the dimensionality of the error space (after parameter estimation) with a mere arithmetic subtraction. They also struggle to grasp the independence of the numerator and denominator sums of squares for the F-statistic, which stems from the orthogonality of corresponding projection operators under normality.

Academic Inquiries.

Why is the normality assumption so critical for these distributions?

The Chi-squared distribution is defined as the sum of squared standard normal random variables. If the underlying errors $\epsilon_i$ are not normal, then $\epsilon_i^2/\sigma^2$ will not be Chi-squared, and consequently, sums of these will not follow Chi-squared or F distributions. While asymptotic results exist for large samples, for finite samples, exact inference relies heavily on normality.

How does Cochran's Theorem relate to the independence of sums of squares?

Cochran's Theorem provides a powerful formal proof for the independence of various sums of squares. It states that if a total sum of squares of standard normal variables can be decomposed into several quadratic forms, and the ranks of the matrices defining these quadratic forms sum up to the total number of variables, then these quadratic forms are independently Chi-squared distributed. In GLMs, the projection matrices for $RSS$ and $SSR$ satisfy the conditions of Cochran's Theorem, guaranteeing their independence.

What happens if the model is misspecified (e.g., non-linear relationships, omitted variables)?

Model misspecification can invalidate the assumption that $Y \sim N_n(X\beta, \sigma^2 I_n)$ . If the linear structure is incorrect, $X\beta$ will not accurately represent the mean, and the residuals will not represent pure noise. This can lead to biased estimators, inconsistent variance estimates, and most importantly for this topic, the distributions of the sums of squares will no longer be Chi-squared or F, even if the errors are otherwise normal.

Why do the degrees of freedom change (e.g., from $n$ to $n-p$ )?

Degrees of freedom represent the number of independent pieces of information available to estimate a parameter or contribute to a sum of squares. When we estimate $p$ parameters (e.g., $\beta$ coefficients) from $n$ observations, we effectively 'lose' $p$ degrees of freedom. For instance, the residual sum of squares uses $n$ observations to calculate residuals, but these residuals are constrained by the $p$ estimated parameters, leaving $n-p$ independent pieces of information for the error variance estimate.

Can these distributions be applied when errors have unequal variances (heteroscedasticity)?

No, not directly. The theorem relies on the assumption of $\sigma^2 I_n$ for the covariance matrix of $Y$ , meaning homoscedasticity (equal variances) and independence. If there is heteroscedasticity, the quadratic forms $Y'(I-P_X)Y$ and $Y'P_X Y$ will not necessarily follow Chi-squared distributions when scaled by a single $\sigma^2$ , and the independence property derived from orthogonal projections might also be affected. Specialized methods (e.g., Weighted Least Squares, robust standard errors) are needed.

Standardized References.

Definitive Institutional SourceSeber, G. A. F., & Lee, A. J. (2003). Linear Regression Analysis. John Wiley & Sons. | Rencher, A. C., & Schaalje, G. B. (2008). Linear Models in Statistics. John Wiley & Sons.

Advanced

The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions

Master the matrix formulation of the General Linear Model, $ Y = X\beta + \epsilon $, and its fundamental assumptions. Rigorous yet intuitive content for BSc Math/Stats students.

Foundational

Derivation of the Ordinary Least Squares (OLS) Estimator: β̂ = (X'X)⁻¹X'Y

Master the OLS estimator derivation: $ \hat{\beta} = (X'X)^{-1}X'Y $. Explore the geometric orthogonality, matrix calculus, and Gauss-Markov foundations.

Foundational

Proof of Unbiasedness of the OLS Estimator: E(β̂) = β

Master the rigorous proof of OLS estimator unbiasedness, $ E(\hat{\boldsymbol{\beta}}) = \boldsymbol{\beta} $. Understand critical assumptions, geometric intuition, and common pitfalls for robust linear modeling.

Foundational

Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹

A rigorous derivation of the Variance-Covariance matrix for the OLS estimator, exploring the geometric impact of data configuration on statistical precision.

Institutional Citation

Reference this proof in your academic research or publications.

NICEFA Visual Mathematics. (2026). Proof of the Distribution of Sums of Squares under Normality Assumptions (Chi-squared and F distributions): Visual Proof & Intuition. Retrieved from https://www.nicefa.org/library/general-linear-models-/proof-of-the-distribution-of-sums-of-squares-under-normality-assumptions--chi-squared-and-f-distributions-

Dominate the Logic.

"Abstract theory is just a movement we haven't seen yet."

Subscribe for Full Proofs Early Access

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

Why is the normality assumption so critical for these distributions?

How does Cochran's Theorem relate to the independence of sums of squares?

What happens if the model is misspecified (e.g., non-linear relationships, omitted variables)?

Why do the degrees of freedom change (e.g., from n n n to n−p n-p n−p)?

Can these distributions be applied when errors have unequal variances (heteroscedasticity)?

Standardized References.

Related Proofs Cluster.

The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions

Derivation of the Ordinary Least Squares (OLS) Estimator: β̂ = (X'X)⁻¹X'Y

Proof of Unbiasedness of the OLS Estimator: E(β̂) = β

Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹

Institutional Citation

Dominate the Logic.

Why do the degrees of freedom change (e.g., from $n$ to $n-p$ )?