Q: What are the practical consequences if the design matrix $ \mathbf{X} $ is not full rank?

If $ \mathbf{X} $ is not full rank, it implies perfect multicollinearity among the independent variables. This means $ \mathbf{X}^T\mathbf{X} $ is singular and therefore not invertible, making it impossible to compute the unique OLS estimator $ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y} $. The model parameters become unidentified, leading to infinitely many possible solutions for $ \boldsymbol{\beta} $. In practice, statistical software will usually alert you or fail to produce estimates.

Q: How does violating the exogeneity assumption $ E[\boldsymbol{\epsilon} | \mathbf{X}] = \mathbf{0} $ affect the OLS estimator?

Violating $ E[\boldsymbol{\epsilon} | \mathbf{X}] = \mathbf{0} $, often referred to as endogeneity, is a severe problem. It means that the error term is systematically related to one or more predictors. This leads to biased and inconsistent OLS estimators $ \hat{\boldsymbol{\beta}} $. Unlike heteroscedasticity or autocorrelation, which primarily affect efficiency and standard errors, endogeneity invalidates the estimates themselves, meaning they will not converge to the true population parameters even with infinite data.

Question 1

What are the practical consequences if the design matrix $ \mathbf{X} $ is not full rank?

Accepted Answer

If $ \mathbf{X} $ is not full rank, it implies perfect multicollinearity among the independent variables. This means $ \mathbf{X}^T\mathbf{X} $ is singular and therefore not invertible, making it impossible to compute the unique OLS estimator $ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y} $. The model parameters become unidentified, leading to infinitely many possible solutions for $ \boldsymbol{\beta} $. In practice, statistical software will usually alert you or fail to produce estimates.

Question 2

How does violating the exogeneity assumption $ E[\boldsymbol{\epsilon} | \mathbf{X}] = \mathbf{0} $ affect the OLS estimator?

Accepted Answer

Violating $ E[\boldsymbol{\epsilon} | \mathbf{X}] = \mathbf{0} $, often referred to as endogeneity, is a severe problem. It means that the error term is systematically related to one or more predictors. This leads to biased and inconsistent OLS estimators $ \hat{\boldsymbol{\beta}} $. Unlike heteroscedasticity or autocorrelation, which primarily affect efficiency and standard errors, endogeneity invalidates the estimates themselves, meaning they will not converge to the true population parameters even with infinite data.

Question 3

Does the normality assumption for $ \boldsymbol{\epsilon} $ impact the unbiasedness or consistency of the OLS estimator?

Accepted Answer

No, the normality assumption is not required for OLS estimators to be unbiased (under $ E[\boldsymbol{\epsilon} | \mathbf{X}] = \mathbf{0} $) or consistent. The Gauss-Markov theorem states that OLS is BLUE (Best Linear Unbiased Estimator) under the first four assumptions, without requiring normality. Normality becomes crucial for deriving exact finite-sample distributions for hypothesis tests (t-tests, F-tests) and confidence intervals. Without it, these are only asymptotically valid via the Central Limit Theorem for large sample sizes.

Question 4

Is it permissible to include non-linear transformations of independent variables (e.g., $ X^2 $, $ \log(X) $) in a General Linear Model?

Accepted Answer

Absolutely. The term 'linear' in GLM refers to the model's linearity in its parameters $ \boldsymbol{\beta} $, not necessarily in the independent variables themselves. You can transform your independent variables (e.g., $ x_2 = x_1^2 $ or $ x_3 = \log(x_1) $) and include them in the design matrix $ \mathbf{X} $. As long as the coefficients $ \boldsymbol{\beta} $ multiply these (possibly transformed) variables linearly, it remains a GLM. For example, $ Y = \beta_0 + \beta_1 X + \beta_2 X^2 + \epsilon $ is a GLM.

The Matrix Formulation of the General Linear Model: Y = Xβ + ϵ and its Fundamental Assumptions

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

What are the practical consequences if the design matrix $\mathbf{X}$ is not full rank?

How does violating the exogeneity assumption $E[\boldsymbol{\epsilon} | \mathbf{X}] = \mathbf{0}$ affect the OLS estimator?

Does the normality assumption for $\boldsymbol{\epsilon}$ impact the unbiasedness or consistency of the OLS estimator?

Is it permissible to include non-linear transformations of independent variables (e.g., $X^2$ , $\log(X)$ ) in a General Linear Model?

Standardized References.

Derivation of the Ordinary Least Squares (OLS) Estimator: β̂ = (X'X)⁻¹X'Y

Proof of Unbiasedness of the OLS Estimator: E(β̂) = β

Derivation of the Variance-Covariance Matrix of the OLS Estimator: Var(β̂) = σ²(X'X)⁻¹

The Gauss-Markov Theorem: Proof that OLS is the Best Linear Unbiased Estimator (BLUE)

Institutional Citation

Dominate the Logic.