The General Linear Model
I. Introduction
The General Linear Model (GLM) stands as one of the most versatile statistical frameworks in modern quantitative analysis. From its classical formulation in multiple regression to its extension as the Generalized Linear Model, this framework underpins everything from psychological research to insurance pricing, clinical trials to financial risk assessment.
At its core, the GLM addresses a fundamental question: How does a set of predictor variables influence a response variable? Whether investigating whether news consumption increases depression, whether driving history predicts claim frequency, or whether a new drug outperforms a placebo, the underlying framework is the same.
The genius of the GLM lies in its unification of seemingly disparate techniques. Linear regression, ANOVA, ANCOVA, and the t-test are not separate methods—they are all manifestations of the same underlying model. This unification provides:
- →Conceptual simplicity: Understand the GLM and you understand a vast array of techniques
- →Computational efficiency: The same matrix algebra underpins all GLM applications
- →Interpretability: Coefficients have clear, direct interpretations
II. Mathematical Foundations
II.1 The Standard Formulation
The General Linear Model assumes a response variable Y can be expressed as a linear combination of p predictors plus an error term:
In matrix notation, for n observations:
where:
- →Y is an n x 1 vector of responses
- →X is an n x (p+1) design matrix
- →β is a (p+1) x 1 vector of unknown parameters
- →ε is an n x 1 vector of errors
II.2 The Five Key Assumptions
| Assumption | Mathematical Formulation | Intuitive Meaning | |------------|--------------------------|-------------------| | Linearity | E[Y] = Xβ | Relationship is linear | | Independence | Cov(εi, εj) = 0 for i ≠ j | Observations are independent | | Homoscedasticity | Var(εi) = σ² for all i | Constant variance | | Normality | εi ~ N(0, σ²) | Errors are normal | | No perfect multicollinearity | X has full column rank | Predictors not perfectly correlated |
When these assumptions hold, the Ordinary Least Squares (OLS) estimator is the Best Linear Unbiased Estimator (BLUE):
II.3 The Hat Matrix and Residuals
The hat matrix is:
It maps observed responses Y to fitted values Ŷ:
Residuals can be expressed as:
The sum of squared errors (SSE) follows:
The variance-covariance matrix of residuals reveals diagnostic information:
This implies:
- →Var(ei) = σ²(1 - hii) — high-leverage residuals have smaller variance
- →Cov(ei, ej) = -σ²hij — residuals are correlated
II.4 Types of Residuals
Standardised residuals:
Studentised residuals (using σ̂(i) without the i-th observation):
Jackknifed residuals (externally studentised):
These follow a t-distribution with n - p - 2 degrees of freedom, ideal for outlier detection.
II.5 Leverage and Influence
| Type | Definition | Detection | |------|------------|-----------| | Leverage point | Unusual in predictor space | hii > 2(p+1)/n | | Influential point | Disproportionately affects estimates | Cook's distance Di > 1 |
Example: An 80-year-old policyholder (leverage) or a young driver with a catastrophic claim (influence).
III. The Generalized Linear Model
While the classical GLM is powerful, many real-world problems violate its assumptions. Response variables may be counts, binary, proportions, or positive and skewed.
The Generalized Linear Model extends the framework through three components:
III.1 The Three Components
1. Random Component: Y follows a distribution from the exponential family:
Common distributions: Normal, Binomial, Poisson, Gamma, Tweedie.
2. Systematic Component: Linear predictor:
3. Link Function: Monotonic differentiable function g connecting μ to η:
Common links: Identity (g(μ) = μ), Log (g(μ) = ln(μ)), Logit (g(μ) = ln(μ/(1-μ))), Inverse (g(μ) = 1/μ).
III.2 Estimation
GLMs are estimated via Iteratively Reweighted Least Squares (IRLS), maximizing:
The IRLS algorithm iteratively:
- →Computes fitted values μ̂i and linear predictor η̂i
- →Constructs working dependent variable zi = η̂i + (yi - μ̂i)g'(μ̂i)
- →Computes weights wi = 1 / [Var(Yi)(g'(μi))²]
- →Estimates β via weighted least squares
IV. Real-World Applications
IV.1 Insurance Pricing
The GLM is the actuarial profession's workhorse for non-life insurance pricing.
Problem: Given risk characteristics (age, driving history, vehicle type), what premium should be charged?
Worked Example: Auto Insurance Claim Frequency
Model specification (Poisson GLM with log link):
Results:
| Predictor | Coefficient | Exp(Coef) | p-value | |-----------|-------------|-----------|---------| | Intercept | -1.245 | 0.288 | <0.001 | | Age (per year) | -0.032 | 0.969 | <0.001 | | Vehicle: SUV | 0.156 | 1.169 | 0.012 | | Vehicle: Sports | 0.423 | 1.527 | <0.001 | | Driving Score | -0.041 | 0.960 | <0.001 |
Interpretation:
- →Age: Each year reduces claims by approximately 3.1%
- →Sports cars: 52.7% higher claims than sedans
- →Driving Score: Each point reduces claims by 4.0%
Claim Severity: Gamma GLM
Claim amounts are positive, skewed, and heteroscedastic:
The Gamma distribution has Var(Y) ∝ μ², matching claim severity.
Regularised GLMs
Modern practice uses regularised GLMs (Lasso, Ridge, Elastic Net):
Benefits:
- →Prevents overfitting
- →Learns non-linear patterns
- →Integrates variable selection
- →Improves model stability
IV.2 Psychology
Example 1: Depression and News Consumption:
Example 2: Group Comparisons (ANOVA):
In matrix form: Y = β0 + β1·GroupB + β2·GroupC + ε
Example 3: Gender Differences:
Researchers test if men are more psychopathic than women using paired differences.
Example 4: Stress Reactivity:
Comparing nicotine-deprived versus non-deprived smokers on startle response.
IV.3 Finance
Credit Scoring (Logistic regression):
Used to set credit limits, interest rates, and comply with Basel III.
IV.4 Clinical Trials
GLMs are the backbone of clinical trial analysis:
- →Logistic regression for binary outcomes
- →Poisson regression for count outcomes
- →Linear regression for continuous outcomes
Enables testing drug efficacy, adjusting for confounders, and meeting regulatory standards.
V. Model Diagnostics
V.1 Detecting Violations Through Residual Analysis
| Assumption | Diagnostic Tool | |------------|-----------------| | Linearity | Plot residuals vs. fitted values | | Independence | Plot residuals vs. time/order | | Homoscedasticity | Plot residuals vs. fitted values (fan-shaped pattern) | | Normality | Q-Q plot of residuals | | Multicollinearity | Variance Inflation Factors (VIF > 10 is problematic) |
V.2 Data Transformations
| Violation | Transformation | Example | |-----------|----------------|---------| | Non-normality | Log, Square root, Box-Cox | ln(Y), √Y, (Y^λ - 1)/λ | | Heteroscedasticity | Weighted least squares | 1/√Y, ln(Y) | | Non-linearity | Polynomial, Interaction terms | X², ln(X), X1·X2 |
V.3 Correcting Multicollinearity
- →Remove one correlated predictor
- →Combine correlated predictors (e.g., principal components)
- →Use Ridge regression
- →Collect more data
V.4 Sources of Multicollinearity
| Source | Example | Correction | |--------|---------|------------| | Data collection | Restricted predictor range | Collect full range | | Model specification | Correlated polynomial terms | Use orthogonal polynomials | | Population constraints | Income and education correlation | Combine variables | | Overdefined model | More predictors than observations | Reduce predictors |
VI. Variable Selection Methods
VI.1 Stepwise Regression
Automated procedure for selecting the "best" subset of predictors:
- →Start with a model (empty or full)
- →At each step, add or remove a predictor based on a criterion
- →Stop when no further improvement is possible
VI.2 Forward Selection
Steps:
- →Start with intercept-only model
- →For each predictor not in model, compute F-statistic for adding it
- →Add predictor with largest F-statistic (smallest p-value)
- →Repeat until no predictor meets entry criterion
VI.3 Backward Elimination
Steps:
- →Start with full model containing all predictors
- →For each predictor, compute F-statistic for removing it
- →Remove predictor with smallest F-statistic (largest p-value)
- →Repeat until no predictor meets removal criterion
VI.4 Comparison
| Feature | Forward Selection | Backward Elimination | |---------|-------------------|---------------------| | Starting point | Intercept only | Full model | | Direction | Add predictors | Remove predictors | | Advantages | Computationally efficient | Considers joint effects | | Disadvantages | Can miss joint effects | Computationally intensive | | Common criterion | Entry F ≥ 4 | Removal F < 4 |
VII. Time Series Models (Box-Jenkins)
VII.1 The Box-Jenkins Procedure
Three steps for ARIMA modelling:
| Step | Description | Activities | |------|-------------|------------| | 1. Model Specification | Tentatively identify model class | Examine ACF, PACF; test stationarity | | 2. Model Estimation | Estimate parameters | Maximum likelihood estimation | | 3. Model Diagnostic Checking | Assess adequacy | Residual analysis; modify if needed |
VII.2 Model Diagnostic Checking
Before forecasting, verify model adequacy:
1. Check residual autocorrelation
- →Ljung-Box test: H0: Residuals are uncorrelated
- →ACF and PACF of residuals
- →If significant autocorrelation remains, re-specify model
2. Check residual normality
- →Q-Q plot of residuals
- →Shapiro-Wilk test
- →Consider transformations if non-normal
3. Check residual heteroscedasticity
- →Plot residuals vs. fitted values
- →ARCH-LM test
- →Consider GARCH models if present
4. Check model stability
- →Parameters should be constant over time
- →CUSUM plots for parameter stability
- →Recursive residuals for model adequacy
5. Check forecasting performance
- →Out-of-sample forecast errors
- →Root Mean Squared Error (RMSE)
- →Mean Absolute Percentage Error (MAPE)
VIII. Hypothesis Testing
VIII.1 Testing Overall Significance
Global F-test evaluates whether any predictor has non-zero coefficient:
Test statistic:
For full model vs. intercept-only:
VIII.2 Testing Individual Coefficients
For testing βi = 0:
Under H0, t ~ t(n-p-1).
VIII.3 Confidence Intervals
A 100(1-α)% confidence interval for βi:
VIII.4 The Coefficient of Determination
R² measures proportion of variance explained:
Interpretation:
- →R² between 0 and 1
- →Higher indicates better fit
- →R² = 0.75 means 75% of variance explained
- →Adjusted R² accounts for model complexity
VIII.5 The ANOVA Table for Multiple Regression
| Source | df | SS | MS | F | |--------|----|----|----|----| | Regression | p | SSR | MSR | MSR/MSE | | Residual | n-p-1 | SSE | MSE | | | Total | n-1 | SST | | |
IX. One-Way ANOVA
IX.1 Model Formulation
One-way ANOVA compares means across a groups:
where:
- →Yij is response for j-th observation in group i
- →μ is overall mean
- →αi is effect of treatment i (i = 1, ..., a)
- →εij are independent N(0, σ²)
IX.2 The ANOVA Table
| Source | df | SS | MS | F | |--------|----|----|----|----| | Treatments | a-1 | SSTreatment | MSTreatment | MSTreatment/MSError | | Error | n-a | SSError | MSError | | | Total | n-1 | SSTotal | | |
IX.3 Post-Hoc Analysis: LSD Method
Least Significant Difference determines which group means differ:
Two means differ if |Ȳi - Ȳj| > LSD.
IX.4 Worked Example
Tensile strength of synthetic fibre by three methods (A, B, C):
| Method | Replicates | Total | |--------|------------|-------| | A | 7, 7, 7, 15, 9 | 45 | | B | 12, 12, 17, 18, 18 | 77 | | C | 14, 18, 18, 19, 19 | 88 |
- →Number of treatments: a = 3
- →Replicates: n = 5
- →Total observations: N = 15
X. Implementing GLMs in R
X.1 Basic Syntax
# General Linear Model
model <- lm(y ~ x1 + x2 + x3, data = my_data)
# Generalized Linear Model
model <- glm(y ~ x1 + x2 + x3,
family = binomial(link = "logit"),
data = my_data)
X.2 Common Family Arguments
| family | link | Use Case | | --- | --- | --- | | gaussian | identity | Continuous, normal data | | binomial | logit | Binary outcomes | | poisson | log | Count data | | Gamma | inverse or log | Positive, skewed data |
X.3 Model Diagnostics
# Residuals
residuals(model)
rstandard(model) # Standardized
rstudent(model) # Studentized
# Influence
hatvalues(model) # Leverage
cooks.distance(model) # Cook's distance
# Multicollinearity
library(car)
vif(model)
# Diagnostic plots
plot(model)
X.4 Step-by-Step Example
# 1. Load data
data(mtcars)
# 2. Fit model
model <- lm(mpg ~ wt + hp, data = mtcars)
# 3. View results
summary(model)
# 4. Diagnostics
plot(model)
# 5. Predictions
predict(model, newdata = data.frame(wt = 3.5, hp = 150))
X.5 Advanced Features
# Regularised GLMs
library(glmnet)
cv_model <- cv.glmnet(x = as.matrix(my_data[, -1]),
y = my_data$y,
family = "poisson",
alpha = 1) # Lasso
# Offset (insurance exposure)
model_offset <- glm(Claims ~ Age + Vehicle_Type + offset(log(Exposure)),
family = poisson(link = "log"),
data = insurance_data)
# Interactions
model_interaction <- lm(mpg ~ wt * hp, data = mtcars)
# Polynomial terms
model_poly <- lm(mpg ~ wt + I(wt^2) + hp, data = mtcars)
X.6 One-Way ANOVA in R
# ANOVA using lm()
model <- lm(strength ~ method, data = fibre_data)
anova(model)
# LSD post-hoc
library(agricolae)
LSD.test(model, "method", alpha = 0.05)
XI. Frequently Asked Questions
11.1 What is the General Linear Model?
The GLM is a unifying statistical framework encompassing linear regression, ANOVA, ANCOVA, and t-tests. It assumes:
Key assumptions: Linearity, Independence, Homoscedasticity, Normality.
11.2 GLM vs. Generalized Linear Model?
| Feature | General Linear Model | Generalized Linear Model | | --- | --- | --- | | Distribution | Normal only | Any exponential family | | Link function | Identity | Any (log, logit, probit) | | Variance | Constant | Can vary | | Examples | Linear regression, ANOVA | Logistic, Poisson regression |
When to use: GLM for continuous normal data; GLM (generalized) for counts, binary, skewed data.
11.3 Psychology Examples?
Depression and news consumption:
Group comparisons (ANOVA):
Gender differences in psychopathy: Test using paired differences.
Stress reactivity in smokers: Compare deprived vs. non-deprived groups.
11.4 The GLM Formula?
Simple regression:
Multiple regression:
Matrix form:
OLS estimator:
11.5 GLM vs. ANOVA?
ANOVA is a special case of the GLM where all predictors are categorical. The GLM encodes categorical variables as dummy variables:
ANOVA, t-tests, and regression are all manifestations of the same GLM.
11.6 Implementing GLM in R?
# General Linear Model
model <- lm(y ~ x1 + x2 + x3, data = my_data)
summary(model)
plot(model)
predict(model, newdata = data.frame(x1 = 5, x2 = 10, x3 = 15))
# Generalized Linear Model
model <- glm(y ~ x1 + x2 + x3,
family = binomial(link = "logit"),
data = my_data)
11.7 Why Normality Test?
Normality testing is important because violations can lead to:
- →Incorrect Type I error rates
- →Reduced power
- →Invalid confidence intervals
- →Misleading parameter estimates
Key point: Normality applies to errors, not raw data. For large samples, the Central Limit Theorem mitigates violations.
XII. Conclusion
The General Linear Model stands as one of the most important statistical frameworks ever developed. Its elegance lies in simplicity: a unified approach to modelling relationships between a response and multiple predictors. Yet its power extends far beyond simple linear regression.
Key insights:
- →Unification: Linear regression, ANOVA, ANCOVA, and t-tests are all special cases
- →Generalization: The Generalized Linear Model handles counts, binary outcomes, and skewed data
- →Practical impact: GLMs are the workhorse of insurance pricing, credit scoring, clinical trials, and psychological research
- →Diagnostics: Residual analysis is essential for validating assumptions
- →Modern extensions: Regularisation techniques (Lasso, Ridge) improve performance in high-dimensional settings
As data grows in volume and complexity, the GLM will continue to evolve. Machine learning integration, adaptive methods, and automated variable selection are extending its capabilities. Yet the core principles remain unchanged: the GLM is a tool for understanding relationships, making predictions, and testing hypotheses. It is, in the truest sense, a framework for scientific discovery.
Whether you are a psychologist studying depression, an actuary pricing insurance, a data scientist building predictive models, or a researcher designing clinical trials, the General Linear Model provides the rigorous, interpretable foundation you need.
Institutional Proof
Dive deeper into General Linear Models
See the complete formal proof, animated visual derivations, and the full architectural breakdown in the library.
Enter the Library →The Journal
Subscribe for bi-weekly deep dives into abstract mathematics and statistical inference.