The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions

Master the matrix formulation of the General Linear Model, \( Y = X\beta + \epsilon \), and its fundamental assumptions. Rigorous yet intuitive content for BSc Math/Stats students.

Visualizing...

Our institutional research engineers are currently mapping the formal proof for The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions.

Apply for Institutional Early Access →

The Formal Theorem

The General Linear Model (GLM) posits a linear relationship between a dependent variable Y \mathbf{Y} and a set of independent variables X \mathbf{X} through a vector of unknown coefficients β \boldsymbol{\beta} , corrupted by an additive error term ϵ \boldsymbol{\epsilon} . Formally, for n n observations and p p regressors (including an intercept):
Y=Xβ+ϵwhere YRn×1 is the vector of responses,XRn×p is the design matrix,βRp×1 is the vector of unknown coefficients,ϵRn×1 is the vector of random errors. \begin{aligned} \mathbf{Y} &= \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \\ \text{where } \mathbf{Y} \in \mathbb{R}^{n \times 1} \text{ is the vector of responses,} \\ \mathbf{X} \in \mathbb{R}^{n \times p} \text{ is the design matrix,} \\ \boldsymbol{\beta} \in \mathbb{R}^{p \times 1} \text{ is the vector of unknown coefficients,} \\ \boldsymbol{\epsilon} \in \mathbb{R}^{n \times 1} \text{ is the vector of random errors.} \end{aligned}
The fundamental assumptions governing the GLM for valid inference are: \begin{enumerate} \item \textbf{Linearity in Parameters:} The model is linear in β \boldsymbol{\beta} . E[YX]=Xβ E[\mathbf{Y} | \mathbf{X}] = \mathbf{X}\boldsymbol{\beta} . \item \textbf{Full Rank Design Matrix:} The design matrix X \mathbf{X} has full column rank, i.e., rank(X)=p \text{rank}(\mathbf{X}) = p . This ensures that XTX \mathbf{X}^T\mathbf{X} is invertible. \item \textbf{Exogeneity of Errors (Zero Conditional Mean):} E[ϵX]=0n E[\boldsymbol{\epsilon} | \mathbf{X}] = \mathbf{0}_n . This implies that errors are uncorrelated with regressors and have zero mean. \item \textbf{Homoscedasticity and No Autocorrelation (Spherical Errors):} Var(ϵX)=E[ϵϵTX]=σ2In \text{Var}(\boldsymbol{\epsilon} | \mathbf{X}) = E[\boldsymbol{\epsilon}\boldsymbol{\epsilon}^T | \mathbf{X}] = \sigma^2 \mathbf{I}_n , where σ2 \sigma^2 is a finite positive scalar. \item \textbf{(Optional) Normality of Errors:} ϵXN(0n,σ2In) \boldsymbol{\epsilon} | \mathbf{X} \sim N(\mathbf{0}_n, \sigma^2 \mathbf{I}_n) . This assumption is often added for exact finite-sample inference, particularly hypothesis testing and confidence interval construction. \end{enumerate}

Analytical Intuition.

Imagine the universe of your data as a vast, complex digital simulation, much like the Matrix. Your goal is to uncover the hidden code, the underlying rules that govern how everything interacts. Y \mathbf{Y} is the observed reality – the patterns, phenomena, and outcomes you witness. X \mathbf{X} represents the input parameters, the variables you believe influence reality, like agents' actions or environmental conditions. The elusive β \boldsymbol{\beta} is the secret programming, the set of fundamental coefficients that dictate the true relationships between inputs and outputs. And ϵ \boldsymbol{\epsilon} ? That’s the 'glitch in the Matrix,' the irreducible randomness, measurement errors, or unobserved factors that prevent our perfect understanding. The General Linear Model is our Oracle, attempting to decipher β \boldsymbol{\beta} from the observed Y \mathbf{Y} and X \mathbf{X} , accepting that ϵ \boldsymbol{\epsilon} ensures no model is ever truly perfect. We're trying to find the core truth amidst the noise, assuming the universe operates on certain predictable, linear principles, and that the 'glitches' behave in a well-defined, albeit random, manner.
CAUTION

Institutional Warning.

Students often confuse "linear in variables" with "linear in parameters," mistakenly believing the GLM cannot include polynomial or transformed independent variables. Another common error is underestimating the impact of endogeneity, which leads to fundamentally biased and inconsistent coefficient estimates.

Academic Inquiries.

01

What are the practical consequences if the design matrix X \mathbf{X} is not full rank?

If X \mathbf{X} is not full rank, it implies perfect multicollinearity among the independent variables. This means XTX \mathbf{X}^T\mathbf{X} is singular and therefore not invertible, making it impossible to compute the unique OLS estimator β^=(XTX)1XTY \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y} . The model parameters become unidentified, leading to infinitely many possible solutions for β \boldsymbol{\beta} . In practice, statistical software will usually alert you or fail to produce estimates.

02

How does violating the exogeneity assumption E[ϵX]=0 E[\boldsymbol{\epsilon} | \mathbf{X}] = \mathbf{0} affect the OLS estimator?

Violating E[ϵX]=0 E[\boldsymbol{\epsilon} | \mathbf{X}] = \mathbf{0} , often referred to as endogeneity, is a severe problem. It means that the error term is systematically related to one or more predictors. This leads to biased and inconsistent OLS estimators β^ \hat{\boldsymbol{\beta}} . Unlike heteroscedasticity or autocorrelation, which primarily affect efficiency and standard errors, endogeneity invalidates the estimates themselves, meaning they will not converge to the true population parameters even with infinite data.

03

Does the normality assumption for ϵ \boldsymbol{\epsilon} impact the unbiasedness or consistency of the OLS estimator?

No, the normality assumption is not required for OLS estimators to be unbiased (under E[ϵX]=0 E[\boldsymbol{\epsilon} | \mathbf{X}] = \mathbf{0} ) or consistent. The Gauss-Markov theorem states that OLS is BLUE (Best Linear Unbiased Estimator) under the first four assumptions, without requiring normality. Normality becomes crucial for deriving exact finite-sample distributions for hypothesis tests (t-tests, F-tests) and confidence intervals. Without it, these are only asymptotically valid via the Central Limit Theorem for large sample sizes.

04

Is it permissible to include non-linear transformations of independent variables (e.g., X2 X^2 , log(X) \log(X) ) in a General Linear Model?

Absolutely. The term 'linear' in GLM refers to the model's linearity in its parameters β \boldsymbol{\beta} , not necessarily in the independent variables themselves. You can transform your independent variables (e.g., x2=x12 x_2 = x_1^2 or x3=log(x1) x_3 = \log(x_1) ) and include them in the design matrix X \mathbf{X} . As long as the coefficients β \boldsymbol{\beta} multiply these (possibly transformed) variables linearly, it remains a GLM. For example, Y=β0+β1X+β2X2+ϵ Y = \beta_0 + \beta_1 X + \beta_2 X^2 + \epsilon is a GLM.

Standardized References.

  • Definitive Institutional SourceRencher, A.C., Schaalje, G.B. Linear Models in Statistics.

Institutional Citation

Reference this proof in your academic research or publications.

NICEFA Visual Mathematics. (2026). The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions: Visual Proof & Intuition. Retrieved from https://www.nicefa.org/library/general-linear-models-/the-matrix-formulation-of-the-general-linear-model--y---x------and-its-fundamental-assumptions

Dominate the Logic.

"Abstract theory is just a movement we haven't seen yet."