Method of Least Squares: Minimizing Deviations

Q: Is the OLS estimator biased if the errors are not normally distributed?

No. The OLS estimator $ \hat{\beta} $ remains unbiased as long as the expected value of the errors is zero, regardless of the distribution's shape; normality is only required for exact inference (t-tests and F-tests).

Analytical Intuition.

Imagine the response vector

Y

as a beam of light in an

n

-dimensional space, seeking its reflection in the reality we can measure. The columns of our design matrix

X

span a lower-dimensional subspace—a hyper-plane of possibility. Most often, the observed truth

Y

does not lie on this plane; it is suspended in the void, displaced by the chaotic turbulence of stochastic noise

\epsilon

. The Method of Least Squares acts as a mathematical gravity, pulling

Y

down to its closest relative on the plane. This point of impact is the orthogonal projection,

\hat{Y}

. By minimizing the squared Euclidean distance (the sum of squared deviations), we ensure that the error vector—the distance between our model and reality—is perfectly perpendicular to our predictors. This geometric purity guarantees that we have extracted every ounce of linear signal, leaving behind only uncorrelated noise. It is the cinematic process of collapsing a complex, high-dimensional observation into its most efficient linear shadow.

Institutional Warning.

Students often confuse the unobservable population error

\epsilon

with the observable sample residual

e

. While

\epsilon

represents the true deviation from the population mean,

e

is merely the distance from the estimated projection, constrained by the geometry of the specific sample data.

Academic Inquiries.

Why do we minimize squared deviations instead of absolute deviations?

Squaring results in a continuously differentiable objective function with a closed-form analytical solution. Under the Gauss-Markov assumptions, it also yields the Best Linear Unbiased Estimator (BLUE) with minimum variance.

What happens if the matrix $X^T X$ is not invertible?

This occurs during perfect multicollinearity (rank deficiency). In such cases, the OLS estimator is not unique, and we must utilize a generalized inverse (Moore-Penrose) or regularization techniques like Ridge regression.

Is the OLS estimator biased if the errors are not normally distributed?

No. The OLS estimator $\hat{\beta}$ remains unbiased as long as the expected value of the errors is zero, regardless of the distribution's shape; normality is only required for exact inference (t-tests and F-tests).

NICEFA Visual Mathematics. (2026). Method of Least Squares: Minimizing Deviations: Visual Proof & Intuition. Retrieved from https://nicefa.org/library/statistical-inference-i/method-of-least-squares--minimizing-deviations

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

Why do we minimize squared deviations instead of absolute deviations?

What happens if the matrix $X^T X$ is not invertible?

Is the OLS estimator biased if the errors are not normally distributed?

Standardized References.

Classifying Statistics: Descriptive vs. Inferential

Scales of Measurement: From Nominal to Ratio

Parametric vs. Non-Parametric: A Strategic Advantage

Probability Fundamentals: The Language of Chance

Institutional Citation

Dominate the Logic.

Visualizing...

The Formal Theorem

Analytical Intuition.

Institutional Warning.

Academic Inquiries.

Why do we minimize squared deviations instead of absolute deviations?

What happens if the matrix XTX X^T X XTX is not invertible?

Is the OLS estimator biased if the errors are not normally distributed?

Standardized References.

Related Proofs Cluster.

Classifying Statistics: Descriptive vs. Inferential

Scales of Measurement: From Nominal to Ratio

Parametric vs. Non-Parametric: A Strategic Advantage

Probability Fundamentals: The Language of Chance

Institutional Citation

Dominate the Logic.

What happens if the matrix $X^T X$ is not invertible?