Q: Why are $SSR$ and $SSE$ divided by $ \sigma^2 $ to get chi-squared distributions?

Quadratic forms $ \mathbf{z}^T \mathbf{A} \mathbf{z} $ where $ \mathbf{z} \sim N(\mathbf{0}, \mathbf{I}) $ and $ \mathbf{A} $ is an idempotent matrix with rank $ r $ follow a $ \chi^2(r) $ distribution. Here, $ \boldsymbol{\epsilon} / \sigma \sim N(\mathbf{0}, \mathbf{I}) $. Since $ SSE = \boldsymbol{\epsilon}^T (\mathbf{I} - \mathbf{H}) \boldsymbol{\epsilon} $ and $ SSR = \boldsymbol{\epsilon}^T (\mathbf{H} - \frac{1}{n}\mathbf{J}) \boldsymbol{\epsilon} $ under $ H_0 $, dividing by $ \sigma^2 $ transforms them into the required chi-squared form.

Q: How does Cochran's Theorem apply here?

Cochran's Theorem states that if $ \mathbf{z} \sim N(\mathbf{0}, \mathbf{I}) $ and $ \sum_{i=1}^k \mathbf{A}_i = \mathbf{I} $ where each $ \mathbf{A}_i $ is a symmetric idempotent matrix of rank $ r_i $, then $ \mathbf{z}^T \mathbf{A}_i \mathbf{z} \sim \chi^2(r_i) $ and these chi-squared variables are independent. In our context, $ \mathbf{y} - \mathbf{X}\boldsymbol{\beta} = \boldsymbol{\epsilon} $. Under $ H_0 $, $ \mathbf{y} \approx \beta_0 \mathbf{1} + \boldsymbol{\epsilon} $. We use the orthogonal projection matrices to decompose $ SST $ into $ SSR + SSE $, where the matrices $ \mathbf{P}_{SSR} $ and $ \mathbf{P}_{SSE} $ satisfy the conditions for Cochran's theorem to prove the independence and chi-squared distribution.

Q: What is the role of the centering matrix $ \mathbf{P}_{\mathbf{1}} = \mathbf{I} - \mathbf{1}\mathbf{1}^T/n $ or $ \frac{1}{n}\mathbf{J} $ in $ SSR $?

The centering matrix accounts for the intercept. $ SST = \mathbf{y}^T \mathbf{P}_{\mathbf{1}} \mathbf{y} $ measures total variability around the mean $ \bar{y} $. $ SSR = \mathbf{y}^T (\mathbf{H} - \frac{1}{n}\mathbf{J}) \mathbf{y} $ measures the variability explained by the predictors *beyond* what's explained by just the mean. If the model only had an intercept, $ \mathbf{H} $ would simplify to $ \frac{1}{n}\mathbf{J} $, making $ SSR = 0 $. This ensures that the degrees of freedom for $ SSR $ correctly reflects the number of *additional* parameters introduced by the predictors (i.e., $ p-1 $).

Question 1

Why are $SSR$ and $SSE$ divided by $ \sigma^2 $ to get chi-squared distributions?

Accepted Answer

Quadratic forms $ \mathbf{z}^T \mathbf{A} \mathbf{z} $ where $ \mathbf{z} \sim N(\mathbf{0}, \mathbf{I}) $ and $ \mathbf{A} $ is an idempotent matrix with rank $ r $ follow a $ \chi^2(r) $ distribution. Here, $ \boldsymbol{\epsilon} / \sigma \sim N(\mathbf{0}, \mathbf{I}) $. Since $ SSE = \boldsymbol{\epsilon}^T (\mathbf{I} - \mathbf{H}) \boldsymbol{\epsilon} $ and $ SSR = \boldsymbol{\epsilon}^T (\mathbf{H} - \frac{1}{n}\mathbf{J}) \boldsymbol{\epsilon} $ under $ H_0 $, dividing by $ \sigma^2 $ transforms them into the required chi-squared form.

Question 2

How does Cochran's Theorem apply here?

Accepted Answer

Cochran's Theorem states that if $ \mathbf{z} \sim N(\mathbf{0}, \mathbf{I}) $ and $ \sum_{i=1}^k \mathbf{A}_i = \mathbf{I} $ where each $ \mathbf{A}_i $ is a symmetric idempotent matrix of rank $ r_i $, then $ \mathbf{z}^T \mathbf{A}_i \mathbf{z} \sim \chi^2(r_i) $ and these chi-squared variables are independent. In our context, $ \mathbf{y} - \mathbf{X}\boldsymbol{\beta} = \boldsymbol{\epsilon} $. Under $ H_0 $, $ \mathbf{y} \approx \beta_0 \mathbf{1} + \boldsymbol{\epsilon} $. We use the orthogonal projection matrices to decompose $ SST $ into $ SSR + SSE $, where the matrices $ \mathbf{P}_{SSR} $ and $ \mathbf{P}_{SSE} $ satisfy the conditions for Cochran's theorem to prove the independence and chi-squared distribution.

Question 3

What is the role of the centering matrix $ \mathbf{P}_{\mathbf{1}} = \mathbf{I} - \mathbf{1}\mathbf{1}^T/n $ or $ \frac{1}{n}\mathbf{J} $ in $ SSR $?

Accepted Answer

The centering matrix accounts for the intercept. $ SST = \mathbf{y}^T \mathbf{P}_{\mathbf{1}} \mathbf{y} $ measures total variability around the mean $ \bar{y} $. $ SSR = \mathbf{y}^T (\mathbf{H} - \frac{1}{n}\mathbf{J}) \mathbf{y} $ measures the variability explained by the predictors *beyond* what's explained by just the mean. If the model only had an intercept, $ \mathbf{H} $ would simplify to $ \frac{1}{n}\mathbf{J} $, making $ SSR = 0 $. This ensures that the degrees of freedom for $ SSR $ correctly reflects the number of *additional* parameters introduced by the predictors (i.e., $ p-1 $).

Question 4

What happens to the F-statistic if the model does not include an intercept?

Accepted Answer

If the model does not include an intercept, the sums of squares are calculated differently. $ SST $ would typically be $ \mathbf{y}^T \mathbf{y} $ (total uncorrected sum of squares). $ SSR $ would be $ \mathbf{y}^T \mathbf{H} \mathbf{y} $ and $ SSE = \mathbf{y}^T (\mathbf{I} - \mathbf{H}) \mathbf{y} $. The degrees of freedom would also change: $ df_{regression} = p $ (number of predictors) and $ df_{residual} = n-p $. The F-test would then test $ H_0: \beta_1 = \dots = \beta_p = 0 $ (all coefficients are zero).

Derivation of the F-statistic for Overall Model Significance and its Distribution

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

Why are $SSR$ and $SSE$ divided by $\sigma^2$ to get chi-squared distributions?

How does Cochran's Theorem apply here?

What is the role of the centering matrix $\mathbf{P}_{\mathbf{1}} = \mathbf{I} - \mathbf{1}\mathbf{1}^T/n$ or $\frac{1}{n}\mathbf{J}$ in $SSR$ ?

What happens to the F-statistic if the model does not include an intercept?

Standardized References.

The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions

Derivation of the Ordinary Least Squares (OLS) Estimator: β̂ = (X'X)⁻¹X'Y

Proof of Unbiasedness of the OLS Estimator: E(β̂) = β

Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹

Institutional Citation

Dominate the Logic.

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

Why are SSRSSRSSR and SSESSESSE divided by σ2 \sigma^2 σ2 to get chi-squared distributions?

How does Cochran's Theorem apply here?

What is the role of the centering matrix P1=I−11T/n \mathbf{P}_{\mathbf{1}} = \mathbf{I} - \mathbf{1}\mathbf{1}^T/n P1​=I−11T/n or 1nJ \frac{1}{n}\mathbf{J} n1​J in SSR SSR SSR?

What happens to the F-statistic if the model does not include an intercept?

Standardized References.

Related Proofs Cluster.

The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions

Derivation of the Ordinary Least Squares (OLS) Estimator: β̂ = (X'X)⁻¹X'Y

Proof of Unbiasedness of the OLS Estimator: E(β̂) = β

Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹

Institutional Citation

Dominate the Logic.

Why are $SSR$ and $SSE$ divided by $\sigma^2$ to get chi-squared distributions?

What is the role of the centering matrix $\mathbf{P}_{\mathbf{1}} = \mathbf{I} - \mathbf{1}\mathbf{1}^T/n$ or $\frac{1}{n}\mathbf{J}$ in $SSR$ ?