Return

The General Linear Model

I. Introduction

The General Linear Model (GLM) stands as one of the most versatile statistical frameworks in modern quantitative analysis. From its classical formulation in multiple regression to its extension as the Generalized Linear Model, this framework underpins everything from psychological research to insurance pricing, clinical trials to financial risk assessment.

At its core, the GLM addresses a fundamental question: How does a set of predictor variables influence a response variable? Whether investigating whether news consumption increases depression, whether driving history predicts claim frequency, or whether a new drug outperforms a placebo, the underlying framework is the same.

The genius of the GLM lies in its unification of seemingly disparate techniques. Linear regression, ANOVA, ANCOVA, and the t-test are not separate methods—they are all manifestations of the same underlying model. This unification provides:

  1. Conceptual simplicity: Understand the GLM and you understand a vast array of techniques
  2. Computational efficiency: The same matrix algebra underpins all GLM applications
  3. Interpretability: Coefficients have clear, direct interpretations

II. Mathematical Foundations

II.1 The Standard Formulation

The General Linear Model assumes a response variable Y can be expressed as a linear combination of p predictors plus an error term:

Y=β0+β1X1+β2X2++βpXp+εY = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p + \varepsilon

In matrix notation, for n observations:

Y=Xβ+ε\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}

where:

  • Y is an n x 1 vector of responses
  • X is an n x (p+1) design matrix
  • β is a (p+1) x 1 vector of unknown parameters
  • ε is an n x 1 vector of errors

II.2 The Five Key Assumptions

| Assumption | Mathematical Formulation | Intuitive Meaning | |------------|--------------------------|-------------------| | Linearity | E[Y] = Xβ | Relationship is linear | | Independence | Cov(εi, εj) = 0 for i ≠ j | Observations are independent | | Homoscedasticity | Var(εi) = σ² for all i | Constant variance | | Normality | εi ~ N(0, σ²) | Errors are normal | | No perfect multicollinearity | X has full column rank | Predictors not perfectly correlated |

When these assumptions hold, the Ordinary Least Squares (OLS) estimator is the Best Linear Unbiased Estimator (BLUE):

β^=(XX)1XY\hat{\boldsymbol{\beta}} = (\mathbf{X}^{\prime}\mathbf{X})^{-1}\mathbf{X}^{\prime}\mathbf{Y}

II.3 The Hat Matrix and Residuals

The hat matrix is:

H=X(XX)1X\mathbf{H} = \mathbf{X}(\mathbf{X}^{\prime}\mathbf{X})^{-1}\mathbf{X}^{\prime}

It maps observed responses Y to fitted values Ŷ:

Y^=HY\hat{\mathbf{Y}} = \mathbf{H}\mathbf{Y}

Residuals can be expressed as:

e=YY^=YHY=(IH)Y\mathbf{e} = \mathbf{Y} - \hat{\mathbf{Y}} = \mathbf{Y} - \mathbf{H}\mathbf{Y} = (\mathbf{I} - \mathbf{H})\mathbf{Y}

The sum of squared errors (SSE) follows:

SSE=ee=Y(IH)YSSE = \mathbf{e}^{\prime}\mathbf{e} = \mathbf{Y}^{\prime}(\mathbf{I} - \mathbf{H})\mathbf{Y}

The variance-covariance matrix of residuals reveals diagnostic information:

Var(e)=σ2(IH)Var(\mathbf{e}) = \sigma^{2}(\mathbf{I} - \mathbf{H})

This implies:

  • Var(ei) = σ²(1 - hii) — high-leverage residuals have smaller variance
  • Cov(ei, ej) = -σ²hij — residuals are correlated

II.4 Types of Residuals

Standardised residuals:

ri=eiσ^1hiir_i = \frac{e_i}{\hat{\sigma}\sqrt{1 - h_{ii}}}

Studentised residuals (using σ̂(i) without the i-th observation):

ti=eiσ^(i)1hiit_i = \frac{e_i}{\hat{\sigma}_{(i)}\sqrt{1 - h_{ii}}}

Jackknifed residuals (externally studentised):

ri=eiσ^(i)1hiir_i^* = \frac{e_i}{\hat{\sigma}_{(i)}\sqrt{1 - h_{ii}}}

These follow a t-distribution with n - p - 2 degrees of freedom, ideal for outlier detection.

II.5 Leverage and Influence

| Type | Definition | Detection | |------|------------|-----------| | Leverage point | Unusual in predictor space | hii > 2(p+1)/n | | Influential point | Disproportionately affects estimates | Cook's distance Di > 1 |

Example: An 80-year-old policyholder (leverage) or a young driver with a catastrophic claim (influence).


III. The Generalized Linear Model

While the classical GLM is powerful, many real-world problems violate its assumptions. Response variables may be counts, binary, proportions, or positive and skewed.

The Generalized Linear Model extends the framework through three components:

III.1 The Three Components

1. Random Component: Y follows a distribution from the exponential family:

f(y;θ,ϕ)=exp{yθb(θ)a(ϕ)+c(y,ϕ)}f(y; \theta, \phi) = \exp\left\{\frac{y\theta - b(\theta)}{a(\phi)} + c(y, \phi)\right\}

Common distributions: Normal, Binomial, Poisson, Gamma, Tweedie.

2. Systematic Component: Linear predictor:

η=β0+β1X1++βpXp=Xβ\eta = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p = \mathbf{X}\boldsymbol{\beta}

3. Link Function: Monotonic differentiable function g connecting μ to η:

g(μ)=ηwhereμ=E[Y]g(\mu) = \eta \quad \text{where} \quad \mu = E[Y]

Common links: Identity (g(μ) = μ), Log (g(μ) = ln(μ)), Logit (g(μ) = ln(μ/(1-μ))), Inverse (g(μ) = 1/μ).

III.2 Estimation

GLMs are estimated via Iteratively Reweighted Least Squares (IRLS), maximizing:

(β)=i=1nlnf(yi;μi,ϕ)\ell(\boldsymbol{\beta}) = \sum_{i=1}^n \ln f(y_i; \mu_i, \phi)

The IRLS algorithm iteratively:

  1. Computes fitted values μ̂i and linear predictor η̂i
  2. Constructs working dependent variable zi = η̂i + (yi - μ̂i)g'(μ̂i)
  3. Computes weights wi = 1 / [Var(Yi)(g'(μi))²]
  4. Estimates β via weighted least squares

IV. Real-World Applications

IV.1 Insurance Pricing

The GLM is the actuarial profession's workhorse for non-life insurance pricing.

Problem: Given risk characteristics (age, driving history, vehicle type), what premium should be charged?

Worked Example: Auto Insurance Claim Frequency

Model specification (Poisson GLM with log link):

log(μi)=β0+β1Agei+β2VehicleTypei+β3DrivingScorei+log(Exposurei)\log(\mu_i) = \beta_0 + \beta_1 \cdot \text{Age}_i + \beta_2 \cdot \text{VehicleType}_i + \beta_3 \cdot \text{DrivingScore}_i + \log(\text{Exposure}_i)

Results:

| Predictor | Coefficient | Exp(Coef) | p-value | |-----------|-------------|-----------|---------| | Intercept | -1.245 | 0.288 | <0.001 | | Age (per year) | -0.032 | 0.969 | <0.001 | | Vehicle: SUV | 0.156 | 1.169 | 0.012 | | Vehicle: Sports | 0.423 | 1.527 | <0.001 | | Driving Score | -0.041 | 0.960 | <0.001 |

Interpretation:

  • Age: Each year reduces claims by approximately 3.1%
  • Sports cars: 52.7% higher claims than sedans
  • Driving Score: Each point reduces claims by 4.0%

Claim Severity: Gamma GLM

Claim amounts are positive, skewed, and heteroscedastic:

log(Si)=β0+β1Agei+β2VehicleTypei+\log(S_i) = \beta_0 + \beta_1 \cdot \text{Age}_i + \beta_2 \cdot \text{VehicleType}_i + \cdots

The Gamma distribution has Var(Y) ∝ μ², matching claim severity.

Regularised GLMs

Modern practice uses regularised GLMs (Lasso, Ridge, Elastic Net):

β^=argminβ{(β)+λP(β)}\hat{\boldsymbol{\beta}} = \arg\min_{\boldsymbol{\beta}} \left\{ -\ell(\boldsymbol{\beta}) + \lambda \cdot P(\boldsymbol{\beta}) \right\}

Benefits:

  1. Prevents overfitting
  2. Learns non-linear patterns
  3. Integrates variable selection
  4. Improves model stability

IV.2 Psychology

Example 1: Depression and News Consumption:

Depression=β0+β1News Minutes+ε\text{Depression} = \beta_0 + \beta_1 \cdot \text{News Minutes} + \varepsilon

Example 2: Group Comparisons (ANOVA):

Yij=μ+αi+εijY_{ij} = \mu + \alpha_i + \varepsilon_{ij}

In matrix form: Y = β0 + β1·GroupB + β2·GroupC + ε

Example 3: Gender Differences:

Researchers test if men are more psychopathic than women using paired differences.

Example 4: Stress Reactivity:

Comparing nicotine-deprived versus non-deprived smokers on startle response.

IV.3 Finance

Credit Scoring (Logistic regression):

log(p1p)=β0+β1Income+β2CreditScore+β3DebtToIncome+\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 \cdot \text{Income} + \beta_2 \cdot \text{CreditScore} + \beta_3 \cdot \text{DebtToIncome} + \cdots

Used to set credit limits, interest rates, and comply with Basel III.

IV.4 Clinical Trials

GLMs are the backbone of clinical trial analysis:

  • Logistic regression for binary outcomes
  • Poisson regression for count outcomes
  • Linear regression for continuous outcomes

Enables testing drug efficacy, adjusting for confounders, and meeting regulatory standards.


V. Model Diagnostics

V.1 Detecting Violations Through Residual Analysis

| Assumption | Diagnostic Tool | |------------|-----------------| | Linearity | Plot residuals vs. fitted values | | Independence | Plot residuals vs. time/order | | Homoscedasticity | Plot residuals vs. fitted values (fan-shaped pattern) | | Normality | Q-Q plot of residuals | | Multicollinearity | Variance Inflation Factors (VIF > 10 is problematic) |

V.2 Data Transformations

| Violation | Transformation | Example | |-----------|----------------|---------| | Non-normality | Log, Square root, Box-Cox | ln(Y), √Y, (Y^λ - 1)/λ | | Heteroscedasticity | Weighted least squares | 1/√Y, ln(Y) | | Non-linearity | Polynomial, Interaction terms | X², ln(X), X1·X2 |

V.3 Correcting Multicollinearity

  1. Remove one correlated predictor
  2. Combine correlated predictors (e.g., principal components)
  3. Use Ridge regression
  4. Collect more data

V.4 Sources of Multicollinearity

| Source | Example | Correction | |--------|---------|------------| | Data collection | Restricted predictor range | Collect full range | | Model specification | Correlated polynomial terms | Use orthogonal polynomials | | Population constraints | Income and education correlation | Combine variables | | Overdefined model | More predictors than observations | Reduce predictors |


VI. Variable Selection Methods

VI.1 Stepwise Regression

Automated procedure for selecting the "best" subset of predictors:

  1. Start with a model (empty or full)
  2. At each step, add or remove a predictor based on a criterion
  3. Stop when no further improvement is possible

VI.2 Forward Selection

Steps:

  1. Start with intercept-only model
  2. For each predictor not in model, compute F-statistic for adding it
  3. Add predictor with largest F-statistic (smallest p-value)
  4. Repeat until no predictor meets entry criterion

VI.3 Backward Elimination

Steps:

  1. Start with full model containing all predictors
  2. For each predictor, compute F-statistic for removing it
  3. Remove predictor with smallest F-statistic (largest p-value)
  4. Repeat until no predictor meets removal criterion

VI.4 Comparison

| Feature | Forward Selection | Backward Elimination | |---------|-------------------|---------------------| | Starting point | Intercept only | Full model | | Direction | Add predictors | Remove predictors | | Advantages | Computationally efficient | Considers joint effects | | Disadvantages | Can miss joint effects | Computationally intensive | | Common criterion | Entry F ≥ 4 | Removal F < 4 |


VII. Time Series Models (Box-Jenkins)

VII.1 The Box-Jenkins Procedure

Three steps for ARIMA modelling:

| Step | Description | Activities | |------|-------------|------------| | 1. Model Specification | Tentatively identify model class | Examine ACF, PACF; test stationarity | | 2. Model Estimation | Estimate parameters | Maximum likelihood estimation | | 3. Model Diagnostic Checking | Assess adequacy | Residual analysis; modify if needed |

VII.2 Model Diagnostic Checking

Before forecasting, verify model adequacy:

1. Check residual autocorrelation

  • Ljung-Box test: H0: Residuals are uncorrelated
  • ACF and PACF of residuals
  • If significant autocorrelation remains, re-specify model

2. Check residual normality

  • Q-Q plot of residuals
  • Shapiro-Wilk test
  • Consider transformations if non-normal

3. Check residual heteroscedasticity

  • Plot residuals vs. fitted values
  • ARCH-LM test
  • Consider GARCH models if present

4. Check model stability

  • Parameters should be constant over time
  • CUSUM plots for parameter stability
  • Recursive residuals for model adequacy

5. Check forecasting performance

  • Out-of-sample forecast errors
  • Root Mean Squared Error (RMSE)
  • Mean Absolute Percentage Error (MAPE)

VIII. Hypothesis Testing

VIII.1 Testing Overall Significance

Global F-test evaluates whether any predictor has non-zero coefficient:

H0:β1=β2==βp=0H_0: \beta_1 = \beta_2 = \cdots = \beta_p = 0

Test statistic:

F=(SSERSSEF)/(pFpR)SSEF/(npF1)F = \frac{(SSE_R - SSE_F)/(p_F - p_R)}{SSE_F/(n - p_F - 1)}

For full model vs. intercept-only:

F=(TSSSSE)/pSSE/(np1)=MSRMSEF = \frac{(TSS - SSE)/p}{SSE/(n - p - 1)} = \frac{MSR}{MSE}

VIII.2 Testing Individual Coefficients

For testing βi = 0:

t=β^iSE(β^i)t = \frac{\hat{\beta}_i}{SE(\hat{\beta}_i)}

Under H0, t ~ t(n-p-1).

VIII.3 Confidence Intervals

A 100(1-α)% confidence interval for βi:

β^i±tα/2,np1SE(β^i)\hat{\beta}_i \pm t_{\alpha/2, n-p-1} \cdot SE(\hat{\beta}_i)

VIII.4 The Coefficient of Determination

R² measures proportion of variance explained:

R2=SSRSST=1SSESSTR^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}

Interpretation:

  • R² between 0 and 1
  • Higher indicates better fit
  • R² = 0.75 means 75% of variance explained
  • Adjusted R² accounts for model complexity

VIII.5 The ANOVA Table for Multiple Regression

| Source | df | SS | MS | F | |--------|----|----|----|----| | Regression | p | SSR | MSR | MSR/MSE | | Residual | n-p-1 | SSE | MSE | | | Total | n-1 | SST | | |


IX. One-Way ANOVA

IX.1 Model Formulation

One-way ANOVA compares means across a groups:

Yij=μ+αi+εijY_{ij} = \mu + \alpha_i + \varepsilon_{ij}

where:

  • Yij is response for j-th observation in group i
  • μ is overall mean
  • αi is effect of treatment i (i = 1, ..., a)
  • εij are independent N(0, σ²)

IX.2 The ANOVA Table

| Source | df | SS | MS | F | |--------|----|----|----|----| | Treatments | a-1 | SSTreatment | MSTreatment | MSTreatment/MSError | | Error | n-a | SSError | MSError | | | Total | n-1 | SSTotal | | |

IX.3 Post-Hoc Analysis: LSD Method

Least Significant Difference determines which group means differ:

LSD=tα/2,naMSE(1ni+1nj)LSD = t_{\alpha/2, n-a} \cdot \sqrt{MSE \cdot \left(\frac{1}{n_i} + \frac{1}{n_j}\right)}

Two means differ if |Ȳi - Ȳj| > LSD.

IX.4 Worked Example

Tensile strength of synthetic fibre by three methods (A, B, C):

| Method | Replicates | Total | |--------|------------|-------| | A | 7, 7, 7, 15, 9 | 45 | | B | 12, 12, 17, 18, 18 | 77 | | C | 14, 18, 18, 19, 19 | 88 |

  • Number of treatments: a = 3
  • Replicates: n = 5
  • Total observations: N = 15

X. Implementing GLMs in R

X.1 Basic Syntax

# General Linear Model
model <- lm(y ~ x1 + x2 + x3, data = my_data)

# Generalized Linear Model
model <- glm(y ~ x1 + x2 + x3, 
             family = binomial(link = "logit"), 
             data = my_data)

X.2 Common Family Arguments

| family | link | Use Case | | --- | --- | --- | | gaussian | identity | Continuous, normal data | | binomial | logit | Binary outcomes | | poisson | log | Count data | | Gamma | inverse or log | Positive, skewed data |

X.3 Model Diagnostics

# Residuals
residuals(model)
rstandard(model)   # Standardized
rstudent(model)    # Studentized

# Influence
hatvalues(model)   # Leverage
cooks.distance(model)   # Cook's distance

# Multicollinearity
library(car)
vif(model)

# Diagnostic plots
plot(model)

X.4 Step-by-Step Example

# 1. Load data
data(mtcars)

# 2. Fit model
model <- lm(mpg ~ wt + hp, data = mtcars)

# 3. View results
summary(model)

# 4. Diagnostics
plot(model)

# 5. Predictions
predict(model, newdata = data.frame(wt = 3.5, hp = 150))

X.5 Advanced Features

# Regularised GLMs
library(glmnet)
cv_model <- cv.glmnet(x = as.matrix(my_data[, -1]), 
                      y = my_data$y, 
                      family = "poisson", 
                      alpha = 1)  # Lasso

# Offset (insurance exposure)
model_offset <- glm(Claims ~ Age + Vehicle_Type + offset(log(Exposure)),
                    family = poisson(link = "log"),
                    data = insurance_data)

# Interactions
model_interaction <- lm(mpg ~ wt * hp, data = mtcars)

# Polynomial terms
model_poly <- lm(mpg ~ wt + I(wt^2) + hp, data = mtcars)

X.6 One-Way ANOVA in R

# ANOVA using lm()
model <- lm(strength ~ method, data = fibre_data)
anova(model)

# LSD post-hoc
library(agricolae)
LSD.test(model, "method", alpha = 0.05)


XI. Frequently Asked Questions

11.1 What is the General Linear Model?

The GLM is a unifying statistical framework encompassing linear regression, ANOVA, ANCOVA, and t-tests. It assumes:

Y=β0+β1X1++βpXp+εY = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p + \varepsilon

Key assumptions: Linearity, Independence, Homoscedasticity, Normality.

11.2 GLM vs. Generalized Linear Model?

| Feature | General Linear Model | Generalized Linear Model | | --- | --- | --- | | Distribution | Normal only | Any exponential family | | Link function | Identity | Any (log, logit, probit) | | Variance | Constant | Can vary | | Examples | Linear regression, ANOVA | Logistic, Poisson regression |

When to use: GLM for continuous normal data; GLM (generalized) for counts, binary, skewed data.

11.3 Psychology Examples?

Depression and news consumption:

Depression=β0+β1News Minutes+ε\text{Depression} = \beta_0 + \beta_1 \cdot \text{News Minutes} + \varepsilon

Group comparisons (ANOVA):

Yij=μ+αi+εijY_{ij} = \mu + \alpha_i + \varepsilon_{ij}

Gender differences in psychopathy: Test using paired differences.

Stress reactivity in smokers: Compare deprived vs. non-deprived groups.

11.4 The GLM Formula?

Simple regression:

yi=β0+β1xi+εiy_i = \beta_0 + \beta_1 x_i + \varepsilon_i

Multiple regression:

yi=β0+β1xi1++βpxip+εiy_i = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip} + \varepsilon_i

Matrix form:

Y=Xβ+ε\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}

OLS estimator:

β^=(XX)1XY\hat{\boldsymbol{\beta}} = (\mathbf{X}^{\prime}\mathbf{X})^{-1}\mathbf{X}^{\prime}\mathbf{Y}

11.5 GLM vs. ANOVA?

ANOVA is a special case of the GLM where all predictors are categorical. The GLM encodes categorical variables as dummy variables:

Y=β0+β1GroupB+β2GroupC+εY = \beta_0 + \beta_1 \cdot \text{GroupB} + \beta_2 \cdot \text{GroupC} + \varepsilon

ANOVA, t-tests, and regression are all manifestations of the same GLM.

11.6 Implementing GLM in R?

# General Linear Model
model <- lm(y ~ x1 + x2 + x3, data = my_data)
summary(model)
plot(model)
predict(model, newdata = data.frame(x1 = 5, x2 = 10, x3 = 15))

# Generalized Linear Model
model <- glm(y ~ x1 + x2 + x3, 
             family = binomial(link = "logit"), 
             data = my_data)

11.7 Why Normality Test?

Normality testing is important because violations can lead to:

  1. Incorrect Type I error rates
  2. Reduced power
  3. Invalid confidence intervals
  4. Misleading parameter estimates

Key point: Normality applies to errors, not raw data. For large samples, the Central Limit Theorem mitigates violations.


XII. Conclusion

The General Linear Model stands as one of the most important statistical frameworks ever developed. Its elegance lies in simplicity: a unified approach to modelling relationships between a response and multiple predictors. Yet its power extends far beyond simple linear regression.

Key insights:

  1. Unification: Linear regression, ANOVA, ANCOVA, and t-tests are all special cases
  2. Generalization: The Generalized Linear Model handles counts, binary outcomes, and skewed data
  3. Practical impact: GLMs are the workhorse of insurance pricing, credit scoring, clinical trials, and psychological research
  4. Diagnostics: Residual analysis is essential for validating assumptions
  5. Modern extensions: Regularisation techniques (Lasso, Ridge) improve performance in high-dimensional settings

As data grows in volume and complexity, the GLM will continue to evolve. Machine learning integration, adaptive methods, and automated variable selection are extending its capabilities. Yet the core principles remain unchanged: the GLM is a tool for understanding relationships, making predictions, and testing hypotheses. It is, in the truest sense, a framework for scientific discovery.

Whether you are a psychologist studying depression, an actuary pricing insurance, a data scientist building predictive models, or a researcher designing clinical trials, the General Linear Model provides the rigorous, interpretable foundation you need.

Institutional Proof

Dive deeper into General Linear Models

See the complete formal proof, animated visual derivations, and the full architectural breakdown in the library.

Enter the Library →

The Journal

Subscribe for bi-weekly deep dives into abstract mathematics and statistical inference.