The t-statistic for Individual Regression Coefficients: Derivation and its Distribution

Q: Why is the t-distribution used instead of the Normal distribution?

Because $ \sigma^2 $ is unknown, we must estimate it using $ \hat{\sigma}^2 $. The resulting dependency introduces extra uncertainty, requiring the heavier tails of the t-distribution.

Q: What happens as \( n-p \to \infty \?

By the Law of Large Numbers, $ \hat{\sigma}^2 \to \sigma^2 $. The t-distribution converges to the Standard Normal distribution $ N(0,1) $.

Q: Does the t-test for $ \beta_j $ depend on other coefficients?

Yes, through the matrix $ (X^TX)^{-1} $. Multicollinearity increases the diagonal elements $ [(X^TX)^{-1}]_{jj} $, thereby inflating the standard error and reducing the t-statistic.

Master the derivation and distribution of the t-statistic in GLMs. Explore the geometry, the role of variance estimation, and its t-distribution convergence.

Visualizing...

Our institutional research engineers are currently mapping the formal proof for The t-statistic for Individual Regression Coefficients: Derivation and its Distribution.

Apply for Institutional Early Access →

The Formal Theorem

Consider the linear model

Y = X\beta + \epsilon

, where

Y \in \mathbb{R}^n

X \in \mathbb{R}^{n \times p}

is a full-rank matrix, and

\epsilon \sim N(0, \sigma^2 I_n)

. The ordinary least squares estimator

\hat{\beta}

satisfies

\hat{\beta} \sim N(\beta, \sigma^2 (X^T X)^{-1})

. Let

\hat{\sigma}^2 = \frac{e^T e}{n-p}

be the unbiased estimator of

\sigma^2

, where

e = Y - X\hat{\beta}

. For any component

\hat{\beta}_j

, the statistic:

\begin{aligned} t = \frac{\hat{\beta}_j - \beta_j}{\sqrt{\hat{\sigma}^2 [(X^T X)^{-1}]_{jj}}} \sim t_{n-p} \end{aligned}

follows a Student's t-distribution with

n-p

degrees of freedom.

Analytical Intuition.

In the vast multidimensional space of our data,

\hat{\beta}

is our best estimate of the truth

\beta

, but it is inherently noisy. Imagine peering through a lens that vibrates due to the underlying variance

\sigma^2

. To confirm if a specific variable

X_j

actually influences the outcome

Y

, we must quantify how far our estimate

\hat{\beta}_j

deviates from a null hypothesis (typically

\beta_j = 0

) relative to the 'noise' we perceive. The numerator

\hat{\beta}_j - \beta_j

captures the signal deviation, while the denominator acts as a scaling factor, normalizing this deviation by the estimated uncertainty. By dividing a Gaussian variable by the square root of a scaled

\chi^2

variable, we transition from the rigid world of the Normal distribution to the fatter-tailed Student's t-distribution. This reflects the reality that our estimate of the noise

\hat{\sigma}^2

is itself uncertain, requiring us to be more conservative in our claims of statistical significance.

CAUTION

Institutional Warning.

Students frequently confuse the standard error of the coefficient $\text{SE}(\hat{\beta}_j)$ with the residual standard error $\hat{\sigma}$ . The former is specific to the sensitivity of $\beta_j$ to the data layout, while the latter represents the global noise level of the model.

Academic Inquiries.

Why is the t-distribution used instead of the Normal distribution?

Because $\sigma^2$ is unknown, we must estimate it using $\hat{\sigma}^2$ . The resulting dependency introduces extra uncertainty, requiring the heavier tails of the t-distribution.

What happens as \( n-p \to \infty \?

By the Law of Large Numbers, $\hat{\sigma}^2 \to \sigma^2$ . The t-distribution converges to the Standard Normal distribution $N(0,1)$ .

Does the t-test for $\beta_j$ depend on other coefficients?

Yes, through the matrix $(X^TX)^{-1}$ . Multicollinearity increases the diagonal elements $[(X^TX)^{-1}]_{jj}$ , thereby inflating the standard error and reducing the t-statistic.

Standardized References.

Definitive Institutional SourceRencher, A. C., & Schaalje, G. B., Linear Models in Statistics.

Advanced

The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions

Master the matrix formulation of the General Linear Model, $ Y = X\beta + \epsilon $, and its fundamental assumptions. Rigorous yet intuitive content for BSc Math/Stats students.

Foundational

Derivation of the Ordinary Least Squares (OLS) Estimator: β̂ = (X'X)⁻¹X'Y

Master the OLS estimator derivation: $ \hat{\beta} = (X'X)^{-1}X'Y $. Explore the geometric orthogonality, matrix calculus, and Gauss-Markov foundations.

Foundational

Proof of Unbiasedness of the OLS Estimator: E(β̂) = β

Master the rigorous proof of OLS estimator unbiasedness, $ E(\hat{\boldsymbol{\beta}}) = \boldsymbol{\beta} $. Understand critical assumptions, geometric intuition, and common pitfalls for robust linear modeling.

Foundational

Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹

A rigorous derivation of the Variance-Covariance matrix for the OLS estimator, exploring the geometric impact of data configuration on statistical precision.

Institutional Citation

Reference this proof in your academic research or publications.

NICEFA Visual Mathematics. (2026). The t-statistic for Individual Regression Coefficients: Derivation and its Distribution: Visual Proof & Intuition. Retrieved from https://www.nicefa.org/library/general-linear-models-/the-t-statistic-for-individual-regression-coefficients--derivation-and-its-distribution

Dominate the Logic.

"Abstract theory is just a movement we haven't seen yet."

Subscribe for Full Proofs Early Access

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

Why is the t-distribution used instead of the Normal distribution?

What happens as \( n-p \to \infty \?

Does the t-test for βj \beta_j βj​ depend on other coefficients?

Standardized References.

Related Proofs Cluster.

The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions

Derivation of the Ordinary Least Squares (OLS) Estimator: β̂ = (X'X)⁻¹X'Y

Proof of Unbiasedness of the OLS Estimator: E(β̂) = β

Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹

Institutional Citation

Dominate the Logic.

Does the t-test for $\beta_j$ depend on other coefficients?