Derivation of the Chi-Square Test Statistic for Goodness-of-Fit and Independence

Exploring the cinematic intuition of Derivation of the Chi-Square Test Statistic for Goodness-of-Fit and Independence.

Visualizing...

Our institutional research engineers are currently mapping the formal proof for Derivation of the Chi-Square Test Statistic for Goodness-of-Fit and Independence.

Apply for Institutional Early Access →

The Formal Theorem

Let X=(O1,O2,,Ok) X = (O_1, O_2, \dots, O_k) be a random vector following a Multinomial distribution with parameters n n and p=(p1,p2,,pk) p = (p_1, p_2, \dots, p_k) . Let the expected frequencies be Ei=npi E_i = n p_i . The Pearson test statistic is defined as:
χ2=i=1k(OiEi)2Ei \chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}
As n n \to \infty , the statistic converges in distribution:
χ2dχkp12 \chi^2 \xrightarrow{d} \chi^2_{k-p-1}
where k k is the number of cells and p p is the number of parameters estimated from the data.

Analytical Intuition.

Imagine a high-dimensional stage where the Multinomial distribution describes the movement of data points across various categories. Under the Null Hypothesis, each category has a 'target' weight defined by Ei E_i . As the sample size n n scales toward infinity, the discrete jumps of the Poisson-like counts smooth out into a continuous Multivariate Normal distribution. However, because all counts must sum to n n , the variables are not independent; they are locked in a geometric dance upon a hyper-plane of dimension k1 k-1 . The Chi-Square statistic is essentially the squared Mahalanobis distance from the observed data to the expected center, normalized by the variance. By squaring these standardized deviations, we transform a complex Gaussian vector into a singular scalar value representing the 'total tension' or discrepancy in the system. It is the mathematical lens that focuses the chaotic vibrations of random sampling into a clear signal of whether our model holds or fractures under the weight of evidence.
CAUTION

Institutional Warning.

Students often conflate the Degrees of Freedom for Goodness-of-Fit with those for Independence. In Independence tests, we estimate marginal probabilities from the data, which imposes additional linear constraints, reducing the dimensions from k1 k-1 to (r1)(c1) (r-1)(c-1) via the subtraction of estimated parameters.

Academic Inquiries.

01

Why is the denominator Ei E_i and not the variance npi(1pi) np_i(1-p_i) ?

While the variance of a single Binomial component is npi(1pi) np_i(1-p_i) , the derivation uses the properties of the Multivariate Normal distribution. The Ei E_i denominator emerges naturally when simplifying the quadratic form of the inverse covariance matrix of the Multinomial distribution.

02

What happens if Ei<5 E_i < 5 ?

The Chi-Square distribution is an asymptotic result (Limit Theorem). When expected counts are small, the discrete nature of the data is not sufficiently 'smoothed' into a Gaussian shape, making the p p -values derived from the continuous Chi-Square curve unreliable.

03

How does the 'Independence' test relate to the 'Goodness-of-Fit' derivation?

Independence is a specific case of Goodness-of-Fit where the hypothesized probabilities pij p_{ij} are products of marginals pi.×p.j p_{i.} \times p_{.j} . The derivation remains the same, but the degrees of freedom are adjusted for estimated parameters.

Standardized References.

  • Definitive Institutional SourcePearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling.

Institutional Citation

Reference this proof in your academic research or publications.

NICEFA Visual Mathematics. (2026). Derivation of the Chi-Square Test Statistic for Goodness-of-Fit and Independence: Visual Proof & Intuition. Retrieved from https://nicefa.org/library/applied-statistics/derivation-of-the-chi-square-test-statistic-for-goodness-of-fit-and-independence

Dominate the Logic.

"Abstract theory is just a movement we haven't seen yet."