Theoretical Basis of Influence Diagnostics: Cook's Distance, DFFITS, and DFBETAS

Master influence diagnostics: Cook's Distance, DFFITS, and DFBETAS. Learn the geometric and theoretical basis for detecting influential data in linear models.

Visualizing...

Our institutional research engineers are currently mapping the formal proof for Theoretical Basis of Influence Diagnostics: Cook's Distance, DFFITS, and DFBETAS.

Apply for Institutional Early Access →

The Formal Theorem

Let the model be defined by Y=Xβ+ϵ Y = X\beta + \epsilon where ϵN(0,σ2I) \epsilon \sim N(0, \sigma^2 I) . Let β^ \hat{\beta} be the OLS estimator and β^(i) \hat{\beta}_{(-i)} be the estimator excluding the i i -th observation. The influence measures are defined as follows:
Di=(β^(i)β^)T(XTX)(β^(i)β^)ps2=ri2p(hii1hii)DFFITSi=y^iy^i(i)s(i)hii=rihii1hiiDFBETASij=β^jβ^j(i)s(i)(XTX)jj1 \begin{aligned} D_i &= \frac{(\hat{\beta}_{(-i)} - \hat{\beta})^T (X^T X) (\hat{\beta}_{(-i)} - \hat{\beta})}{p s^2} = \frac{r_i^2}{p} \left( \frac{h_{ii}}{1 - h_{ii}} \right) \\ \text{DFFITS}_i &= \frac{\hat{y}_i - \hat{y}_{i(-i)}}{s_{(-i)} \sqrt{h_{ii}}} = r_i \sqrt{\frac{h_{ii}}{1 - h_{ii}}} \\ \text{DFBETAS}_{ij} &= \frac{\hat{\beta}_j - \hat{\beta}_{j(-i)}}{s_{(-i)} \sqrt{(X^T X)^{-1}_{jj}}} \end{aligned}
where ri r_i is the studentized residual, hii h_{ii} is the leverage, and s2 s^2 is the MSE.

Analytical Intuition.

Imagine you are constructing a bridge based on a set of coordinate data points X X . Most points act as steady pillars, grounding the structure in statistical reality. However, a single rogue observation (xi,yi) (x_i, y_i) acts like a hidden fault line; by shifting just a few millimeters, it can exert disproportionate leverage hii h_{ii} on the entire architectural design. Cook's Distance Di D_i is the seismic sensor measuring the total kinetic displacement of our regression parameters β^ \hat{\beta} if that observation were suddenly removed. DFFITS looks specifically at the collapse of the predicted value y^i \hat{y}_i , while DFBETAS tracks how individual structural components βj \beta_j buckle under the strain. We are not just fitting a line; we are identifying the 'opinionated' points that hold the model hostage. If Di D_i is high, the model's integrity is compromised, and the entire edifice risks falling over due to the influence of a singular, possibly erroneous, data input.
CAUTION

Institutional Warning.

Students often conflate 'leverage' with 'influence.' Leverage hii h_{ii} is a function of X X alone and measures potential impact, whereas influence (like Cook's Di D_i ) incorporates the response Y Y . An observation can have high leverage but negligible influence if it aligns with the overall trend.

Academic Inquiries.

01

Why is the threshold for Cook's distance often set at 4/n?

The 4/n rule is a heuristic approximation suggesting that an observation is influential if its removal moves the parameter vector by more than the average individual contribution of the observations.

02

Can influence diagnostics be used in models with non-constant variance?

Standard diagnostics assume Var(ϵ)=σ2I \text{Var}(\epsilon) = \sigma^2 I . In heteroscedastic cases, weighted least squares or robust covariance estimators must be employed to avoid misinterpreting variance patterns as influence.

03

What happens to DFBETAS when the design matrix is collinear?

Severe multicollinearity inflates the standard errors of βj \beta_j , making the denominator of DFBETAS large and potentially masking the influence of specific observations.

Standardized References.

  • Definitive Institutional SourceCook, R. D., & Weisberg, S. (1982). Residuals and Influence in Regression.

Institutional Citation

Reference this proof in your academic research or publications.

NICEFA Visual Mathematics. (2026). Theoretical Basis of Influence Diagnostics: Cook's Distance, DFFITS, and DFBETAS: Visual Proof & Intuition. Retrieved from https://www.nicefa.org/library/general-linear-models-/theoretical-basis-of-influence-diagnostics--cook-s-distance--dffits--and-dfbetas

Dominate the Logic.

"Abstract theory is just a movement we haven't seen yet."