Chi-Square (χ²) Tests: Complete Lecture Notes & Interactive Statistics Tutorial

The Chi-Square (χ²) Test Statistic Formula

χ² = ∑

(O - E)²E

Where:
O represents the Observed frequencies (the actual counts in your data).
E represents the Expected frequencies (the counts you would anticipate if your null hypothesis were true).
∑ means to Sum the values calculated for each category or cell in your table.

This formula quantifies the discrepancy between your observed data and what you expected under the null hypothesis. A larger χ² value indicates a greater difference, suggesting the observed data may not fit the expected pattern.

Manual Calculation Example: Goodness-of-Fit (Fair Dice Roll)

Let's walk through calculating the χ² statistic by hand. This helps solidify understanding before relying on software.

Scenario

We roll a standard six-sided die 120 times to test if it's fair. A fair die means each face (1 to 6) should appear with equal probability (1/6).

Null Hypothesis (H₀): The die is fair (P(1)=P(2)=...=P(6)=1/6).

Alternative Hypothesis (H₁): The die is not fair (the probabilities are different).

Observed Data (O): Suppose we observed the following counts after 120 rolls:

Face 1: 18 times
Face 2: 25 times
Face 3: 15 times
Face 4: 22 times
Face 5: 28 times
Face 6: 12 times
(Total Rolls = 18+25+15+22+28+12 = 120)

Step-by-Step Calculation

Calculate Expected Frequencies (E): If the die is fair (H₀ is true), we expect each face to appear:
Total Rolls * P(Face) = 120 * (1/6) = 20 times. So, E = 20 for all faces.
Set up the Calculation Table:

Face	Observed (O)	Expected (E)	O - E	(O - E)²	(O - E)² / E
1	18	20	-2	4	4 / 20 = 0.20
2	25	20	5	25	25 / 20 = 1.25
3	15	20	-5	25	25 / 20 = 1.25
4	22	20	2	4	4 / 20 = 0.20
5	28	20	8	64	64 / 20 = 3.20
6	12	20	-8	64	64 / 20 = 3.20
Total (Sum of last column)					χ² = 9.30

Calculate the χ² Statistic: Sum the values in the last column:
χ² = 0.20 + 1.25 + 1.25 + 0.20 + 3.20 + 3.20 = 9.30
Determine Degrees of Freedom (df): For a Goodness-of-Fit test, df = k - 1, where k is the number of categories (faces).
df = 6 - 1 = 5

Next Step (Interpretation)

We would compare our calculated χ² value (9.30) with a critical value from the Chi-Square distribution table using df=5 and a chosen significance level (e.g., α = 0.05). The critical value for χ²(df=5, α=0.05) is approximately 11.07. Since our calculated value (9.30) is less than the critical value (11.07), we fail to reject H₀. Alternatively, the p-value associated with χ²=9.30 and df=5 is approximately 0.097, which is greater than 0.05. Conclusion: Based on this data, there is not enough statistical evidence to conclude that the die is unfair.

Interactive Chi-Square Distribution Explorer

The tool below allows you to explore the Chi‑Square distribution interactively. Understanding the shape of this distribution and how it changes with degrees of freedom (df) is key to interpreting test results and p-values. Adjust the df and critical value to see their impact.

The Chi-Square (χ²) distribution is fundamental for goodness-of-fit tests and tests of independence. Its shape is defined by the degrees of freedom (df). The distribution is always non-negative and right-skewed, especially for low df. As df increases, the curve spreads out and becomes more symmetrical, resembling a normal distribution.

Use the slider to adjust the degrees of freedom (df) and see how the curve changes. Input a Critical Chi-Square Value (often found from tables or software based on your significance level α) to visualize the corresponding p-value (the area shaded to the right). Hover over the graph to see the probability density (PDF) for specific χ² values. The distribution's mean (df) and mode (df-2, for df>2) are also marked.

Degrees of Freedom (df): df = 5

Critical χ² Value: Crit. Val = 11.07

Mean=5, Mode=3

Usage of Chi-Square Tests

Chi-Square tests are versatile statistical methods for analyzing categorical data. Their main applications include:

Goodness-of-Fit Test: Tests if the observed frequency distribution of a single categorical variable matches a specific, expected distribution. For example, are M&M colors distributed according to the company's claimed percentages, or is a die fair?
Test of Independence: Tests whether two categorical variables collected from a single sample are associated or independent. For example, is there a relationship between smoking status and lung cancer incidence in a population sample?
Test of Homogeneity: Compares the distribution of a categorical variable across two or more different populations or groups to see if the distributions are the same. For example, do different teaching methods lead to the same distribution of student grades (A, B, C, etc.)?

Types of Chi-Square Tests

The two primary types of Pearson’s chi-square (χ²) tests commonly encountered are:

χ² Goodness-of-Fit Test: Used when you have one categorical variable from a single population. It assesses whether the observed frequencies in each category significantly differ from the frequencies you would expect based on a specific hypothesis (e.g., equal proportions, proportions matching a known distribution). The degrees of freedom (df) are calculated as `k - 1`, where `k` is the number of categories.
χ² Test of Independence: Used when you have two categorical variables from a single population sample, usually presented in a contingency table. It evaluates whether there is a statistically significant association (dependency) between the two variables. The degrees of freedom (df) are calculated as `(rows - 1) * (columns - 1)`.

Note: The chi-square test of homogeneity, while conceptually distinct (comparing distributions across multiple populations), uses the same calculation method and degrees of freedom formula as the test of independence.

Example 1: Zodiac Signs (Goodness-of-Fit)

Scenario

A researcher wants to know if the birthdays of 256 executives are evenly distributed throughout the year (i.e., across the 12 zodiac signs). If evenly distributed, we'd expect an equal number of executives for each sign.

Null Hypothesis (H₀): The distribution of executive birthdays across zodiac signs is uniform (equal proportions).

Alternative Hypothesis (H₁): The distribution is not uniform.

Data:

Sign	Observed (O)	Expected (E)	(O-E)²/E
Aries	23	21.33	0.131
Taurus	20	21.33	0.082
Gemini	18	21.33	0.521
Cancer	23	21.33	0.131
Leo	21	21.33	0.005
Virgo	19	21.33	0.254
Libra	18	21.33	0.521
Scorpio	21	21.33	0.005
Sagittarius	19	21.33	0.254
Capricorn	22	21.33	0.021
Aquarius	24	21.33	0.328
Pisces	29	21.33	2.766
Total			χ² ≈ 5.019

Expected Count per sign: Total executives / Number of signs = 256 / 12 ≈ 21.33

Degrees of Freedom (df): Number of categories - 1 = 12 - 1 = 11

R Code (Example): Assuming `observed_counts` is a vector `c(23, 20, ..., 29)`:

observed_counts <- c(23, 20, 18, 23, 21, 19, 18, 21, 19, 22, 24, 29)
 # Expected probabilities are equal (1/12 for each)
 chisq_result_zodiac <- chisq.test(observed_counts)
 print(chisq_result_zodiac)

Result Interpretation: The calculated χ² ≈ 5.019. With df=11, the p-value reported by R is large (p ≈ 0.93). Since p > 0.05, we fail to reject H₀; there's no significant statistical evidence from this sample to suggest that executive birthdays are unevenly distributed across zodiac signs.

Test Function Reminder: chisq.test(observed_vector_count, p = expected_probability_vector) (p defaults to equal probabilities if omitted).

Example 2: Absenteeism (Goodness-of-Fit)

Scenario

Faculty hypothesized absenteeism rates among 100 students. They compared their expectations to actual survey data.

H₀: The observed student absenteeism matches the expected distribution (50% in 0-2, 30% in 3-5, 12% in 6-8, 8% in 9+).

H₁: The observed distribution differs from the expected one.

Data:

Number of Absences	Expected Students (E)	Observed Students (O)	(O-E)²/E
0–2	50	35	4.500
3–5	30	40	3.333
6–8	12	20	5.333
9+	8	5	1.125
Total			χ² = 14.291

Degrees of Freedom (df): k - 1 = 4 - 1 = 3

R Code (Example):

observed_absences <- c(35, 40, 20, 5)
 expected_proportions_abs <- c(0.50, 0.30, 0.12, 0.08) # Use proportions
 chisq_result_abs <- chisq.test(observed_absences, p = expected_proportions_abs)
 print(chisq_result_abs)

Result Interpretation: The calculated χ² ≈ 14.29. With df=3, the p-value reported by R is small (p ≈ 0.0025). Since p < 0.05, we reject H₀. There is significant statistical evidence that the observed student absenteeism distribution differs from what the faculty expected.

Example 3: Televisions (Goodness-of-Fit)

Scenario

We test if the number of TVs per family in a sample of 600 families matches the claimed national distribution percentages.

H₀: The observed TV distribution matches the national percentages (10% 0, 16% 1, 55% 2, 11% 3, 8% 4+).

H₁: The observed distribution does not match.

Data:

Number of TVs	National %	Expected (E) (out of 600)	Observed (O)	(O-E)²/E
0	10%	60	66	0.600
1	16%	96	119	5.510
2	55%	330	340	0.303
3	11%	66	60	0.545
4+	8%	48	15	22.688
Total				χ² ≈ 29.646

Degrees of Freedom (df): k - 1 = 5 - 1 = 4

R Code (Example):

observed_tv <- c(66, 119, 340, 60, 15)
 expected_proportions_tv <- c(0.10, 0.16, 0.55, 0.11, 0.08)
 chisq_result_tv <- chisq.test(observed_tv, p = expected_proportions_tv)
 print(chisq_result_tv)

Result Interpretation: The calculated χ² ≈ 29.65. With df=4, the p-value reported by R is very small (p ≈ 5.7e-6). Since p < 0.05, we reject H₀. There is strong statistical evidence that the distribution of televisions in this sample significantly differs from the claimed national distribution, with the largest discrepancy in the '4+' category.

Class Activity 1: Customer Count (Goodness-of-Fit)

Scenario

A shop owner claims customer traffic is the same every weekday. A researcher recorded counts over a week to test this claim.

H₀: Customer counts are uniformly distributed across the 5 weekdays (Mon-Fri).

H₁: Customer counts are not uniformly distributed.

Data: Mon: 50, Tue: 60, Wed: 40, Thu: 47, Fri: 53. Total Customers = 250.

Expected Count per day (if H₀ is true): Total Customers / Number of days = 250 / 5 = 50

R command to test the hypothesis:

customer_counts <- c(50, 60, 40, 47, 53)
 # Test against equal proportions (default for chisq.test when p is omitted)
 chisq_result_cust <- chisq.test(customer_counts)
 print(chisq_result_cust)
 # Explicitly with proportions: chisq.test(customer_counts, p = rep(1/5, 5))

Example Output: X-squared = 4.36, df = 4, p-value = 0.3595

Conclusion: With a p-value of 0.36 (which is > 0.05), we fail to reject H₀. Based on this sample, there isn't enough statistical evidence to conclude that the customer flow significantly differs across weekdays from what would be expected if it were uniform.

Example: M&M Color Breakdown (Goodness-of-Fit)

Scenario

Test if a sample of 712 M&Ms collected in 2016-2017 matches the color distribution percentages published by Mars in 2008.

H₀: The 2016-2017 sample distribution matches the 2008 published percentages (Blue: 24%, Orange: 20%, Green: 16%, Yellow: 14%, Red: 13%, Brown: 13%).

H₁: The distributions differ.

Data:

Published (Expected %): Blue: 24%, Orange: 20%, Green: 16%, Yellow: 14%, Red: 13%, Brown: 13%
Observed Sample Counts (n=712): Blue: 133, Orange: 133, Green: 139, Yellow: 103, Red: 108, Brown: 96 (Total 712)

R code:

 # Observed counts
 observed_mm = c(133, 133, 139, 103, 108, 96)

 # Expected proportions
 expected_prop_mm = c(0.24, 0.20, 0.16, 0.14, 0.13, 0.13)

 # Perform the test
 chisq_result_mm <- chisq.test(observed_mm, p = expected_prop_mm)
 print(chisq_result_mm)

Example Output: X-squared = 17.066, df = 5, p-value = 0.004377

Conclusion: Since the p-value (0.0044) is less than 0.05, we reject H₀. The color distribution in the 2016-2017 sample is significantly different from the proportions published by Mars in 2008.

Class Activity 2: Eye & Hair Color (Test of Independence)

Scenario

Analyze data on eye color and hair color from a sample of individuals to determine if these two traits are associated (dependent) or independent.

H₀: Eye color and hair color are independent.

H₁: Eye color and hair color are associated (dependent).

Data (Contingency Table - Example from `datasets::HairEyeColor` in R, summed over sex):

 # Example using built-in R dataset
 data(HairEyeColor)
 hair_eye_table <- apply(HairEyeColor, c("Hair", "Eye"), sum) # Sum over Male/Female
 print("Observed Contingency Table (Hair vs Eye Color):")
 print(hair_eye_table)

R code for the chi-square test:

 # Perform the test directly on the table
 test_result_he <- chisq.test(hair_eye_table)
 print(test_result_he)

 print("Expected Counts (under independence assumption):")
 print(round(test_result_he$expected, 2)) # Show expected counts rounded

Degrees of Freedom (df): (Number of Hair Colors - 1) * (Number of Eye Colors - 1) = (4 - 1) * (4 - 1) = 3 * 3 = 9

Example Output (using HairEyeColor data): X-squared = 138.29, df = 9, p-value < 2.2e-16

Conclusion: The p-value is extremely small (effectively zero). We strongly reject H₀. There is a highly significant statistical association between hair color and eye color in this dataset.

Example: Skittles Favorite Flavor (Test of Homogeneity)

Scenario

A survey asked people their favorite Skittle. One group was shown colors, another tasted blindfolded (flavors). We want to know if the distribution of favorite choices is the same for both polling methods (groups).

H₀: The distribution of favorite Skittle choices is the same for the Color Poll and Flavor Poll groups.

H₁: The distributions of favorite choices are different between the two groups.

Data (Contingency Table - Rows: Poll Type, Columns: Skittle Option):

Poll Type	Opt 1	Opt 2	Opt 3	Opt 4	Opt 5	Row Totals
Color Poll	18	9	15	13	11	66
Flavor Poll	13	16	19	34	9	91
Col Totals	31	25	34	47	20	157

R code:

 # Create the matrix
 skittles_data = matrix(c(18, 9, 15, 13, 11,  # Row 1: Color Poll
                          13, 16, 19, 34, 9), # Row 2: Flavor Poll
                          nrow = 2, ncol = 5, byrow = TRUE)
 colnames(skittles_data) <- c("Opt1", "Opt2", "Opt3", "Opt4", "Opt5")
 rownames(skittles_data) <- c("ColorPoll", "FlavorPoll")

 print("Observed Data:")
 print(skittles_data)

 # Perform the test (Test of Independence/Homogeneity)
 chisq_result_skittles <- chisq.test(skittles_data)
 print(chisq_result_skittles)

Degrees of Freedom (df): (Rows - 1) * (Columns - 1) = (2 - 1) * (5 - 1) = 1 * 4 = 4

Example Output: X-squared = 9.0691, df = 4, p-value = 0.0594

Conclusion: The p-value (0.0594) is slightly above the common significance level of 0.05. Therefore, we fail to reject H₀. There isn't statistically significant evidence (at α=0.05) to conclude that the polling method (color vs. flavor) affects the distribution of favorite Skittle choices, although the result is borderline and might warrant further investigation with a larger sample.

Example: Rock-Paper-Scissors (1) - Goodness-of-Fit (Equal Proportions)

Scenario

Test if players choose Rock, Paper, or Scissors with equal frequency in a series of games.

H₀: The choices Rock, Paper, Scissors are made with equal probability (1/3 each).

H₁: The probabilities are not equal.

Data: Total plays = 88 + 74 + 66 = 228

Rock: 88
Paper: 74
Scissors: 66

Expected (if H₀ is true): 228 / 3 = 76 for each choice.

R code:

 rps_counts1 <- c(88, 74, 66)
 # Test against equal probabilities (default)
 chisq_result_rps1 <- chisq.test(rps_counts1)
 print(chisq_result_rps1)
 # Explicitly: chisq.test(rps_counts1, p = c(1/3, 1/3, 1/3))

Degrees of Freedom (df): k - 1 = 3 - 1 = 2

Example Output: X-squared = 3.2632, df = 2, p-value = 0.1956

Conclusion: The p-value (0.196) is greater than 0.05. We fail to reject H₀. There's no significant evidence from this data that players deviated from choosing Rock, Paper, or Scissors with equal frequency.

Example: Rock-Paper-Scissors (2) - Goodness-of-Fit (Unequal Proportions)

Scenario

Test if the observed frequencies of choices match a specific, *unequal* hypothesized distribution (e.g., a player claims they throw Rock 50% of the time, Paper 30%, and Scissors 20%).

H₀: The observed choices follow the distribution P(Rock)=0.5, P(Paper)=0.3, P(Scissors)=0.2.

H₁: The observed choices do not follow this specific distribution.

Data: Total plays = 66 + 39 + 14 = 119

Rock: 66
Paper: 39
Scissors: 14

Expected (if H₀ is true): E(Rock)=119*0.5=59.5, E(Paper)=119*0.3=35.7, E(Scissors)=119*0.2=23.8

R code:

 rps_observed2 <- c(66, 39, 14)
 rps_expected_prop2 <- c(0.5, 0.3, 0.2)
 chisq_result_rps2 <- chisq.test(rps_observed2, p = rps_expected_prop2)
 print(chisq_result_rps2)

Degrees of Freedom (df): k - 1 = 3 - 1 = 2

Example Output: X-squared = 5.0504, df = 2, p-value = 0.08004

Conclusion: The p-value (0.08) is greater than 0.05. We fail to reject H₀. While the observed counts aren't a perfect match to the expected ones (especially for Scissors), the difference is not statistically significant at the α=0.05 level. There isn't enough evidence to disprove the player's claimed strategy based on this data.

Finding Chi-Square with a Contingency Table (Manual Steps & R)

Scenario

Illustrate the calculation steps for a Test of Independence using a simple 2x2 contingency table and verify with R.

H₀: The row variable and column variable are independent.

H₁: The variables are dependent (associated).

Observed Data (Example 2x2 Table):

 # Create the contingency table in R
 observed_table_cont <- matrix(c(60, 40,  # Row 1
                            20, 10), # Row 2
                            nrow = 2, byrow = TRUE)
 colnames(observed_table_cont) <- c("Category A", "Category B")
 rownames(observed_table_cont) <- c("Group 1", "Group 2")

 print("Observed Table:")
 print(observed_table_cont)

Manual Calculation Steps:

Calculate Row and Column Totals: Row1=100, Row2=30, ColA=80, ColB=50, GrandTotal=130
Calculate Expected Frequencies (E) for each cell: E = (Row Total * Column Total) / Grand Total
- E(R1,C1)=(100*80)/130≈61.54
- E(R1,C2)=(100*50)/130≈38.46
- E(R2,C1)=(30*80)/130≈18.46
- E(R2,C2)=(30*50)/130≈11.54
Calculate the Chi-Square component for each cell: (O - E)² / E
- Cell(1,1):(60-61.54)²/61.54≈0.0385
- Cell(1,2):(40-38.46)²/38.46≈0.0615
- Cell(2,1):(20-18.46)²/18.46≈0.1282
- Cell(2,2):(10-11.54)²/11.54≈0.2051
Sum the components to get the Chi-Square statistic (χ²): χ² ≈ 0.0385+0.0615+0.1282+0.2051 ≈ 0.4333
Determine Degrees of Freedom (df): df = (rows-1)*(cols-1) = (2-1)*(2-1) = 1

R Code Verification:

 # Perform the chi-square test using R
 # correct = FALSE removes Yates' continuity correction for 2x2 tables, matching manual calculation more closely
 test_result_cont <- chisq.test(observed_table_cont, correct = FALSE)

 print(test_result_cont)

 print("Expected values from R:")
 print(round(test_result_cont$expected, 2)) # Rounded expected values

Example R Output:

	Pearson's Chi-squared test

 data:  observed_table_cont
 X-squared = 0.43333, df = 1, p-value = 0.5104

 Expected values from R:
          Category A Category B
 Group 1      61.54      38.46
 Group 2      18.46      11.54

Conclusion: The manually calculated χ² statistic (0.433) matches the R output (when `correct=FALSE`). With df=1, the p-value is 0.51. Since p > 0.05, we fail to reject H₀. There is no significant statistical evidence of an association between the row variable (Group) and the column variable (Category) in this data.

Chi-Square Tests: Formula Summary

Here's a quick reference table for the key formulas used in Chi-Square testing:

Concept	Formula	Notes / Applies To
Chi-Square Statistic (χ²)	`χ² = Σ [ (O - E)² / E ]`	Applies to all Chi-Square tests (Goodness-of-Fit, Independence, Homogeneity). Summation (Σ) is over all categories or cells.
Expected Frequency (E)	`E = n * p`	Goodness-of-Fit Test. `n` = total sample size, `p` = expected proportion for the category under H₀.
Expected Frequency (E)	`E = (Row Total * Column Total) / Grand Total`	Test of Independence / Test of Homogeneity. Calculated for each cell in the contingency table.
Degrees of Freedom (df)	`df = k - 1`	Goodness-of-Fit Test. `k` = number of categories.
Degrees of Freedom (df)	`df = (Rows - 1) * (Columns - 1)`	Test of Independence / Test of Homogeneity. Based on the dimensions of the contingency table.

Menu

Lecture Notes: Chi-Square (χ²) Tests

The Chi-Square (χ²) Test Statistic Formula

Manual Calculation Example: Goodness-of-Fit (Fair Dice Roll)

Scenario

Step-by-Step Calculation

Next Step (Interpretation)

Interactive Chi-Square Distribution Explorer

Usage of Chi-Square Tests

Types of Chi-Square Tests

Example 1: Zodiac Signs (Goodness-of-Fit)

Scenario

Example 2: Absenteeism (Goodness-of-Fit)

Scenario

Example 3: Televisions (Goodness-of-Fit)

Scenario

Class Activity 1: Customer Count (Goodness-of-Fit)

Scenario

Example: M&M Color Breakdown (Goodness-of-Fit)

Scenario

Class Activity 2: Eye & Hair Color (Test of Independence)

Scenario

Example: Skittles Favorite Flavor (Test of Homogeneity)

Scenario

Example: Rock-Paper-Scissors (1) - Goodness-of-Fit (Equal Proportions)

Scenario

Example: Rock-Paper-Scissors (2) - Goodness-of-Fit (Unequal Proportions)

Scenario

Finding Chi-Square with a Contingency Table (Manual Steps & R)

Scenario

Chi-Square Tests: Formula Summary