Lecture Notes: Chi-Square (χ²) Tests
Welcome to this comprehensive guide on Chi‑Square (χ²) tests, essential tools in statistics for analyzing categorical data. These notes cover the fundamental concepts, the core formula, manual calculation steps, different types of Chi-Square tests (goodness-of-fit, independence, homogeneity), practical examples with R code snippets, and an interactive visualization to help you understand the Chi-Square distribution itself. We'll explore how to compare observed frequencies with expected frequencies to perform hypothesis testing.
The Chi-Square (χ²) Test Statistic Formula
- Where:
O
represents the Observed frequencies (the actual counts in your data).E
represents the Expected frequencies (the counts you would anticipate if your null hypothesis were true).∑
means to Sum the values calculated for each category or cell in your table.
This formula quantifies the discrepancy between your observed data and what you expected under the null hypothesis. A larger χ² value indicates a greater difference, suggesting the observed data may not fit the expected pattern.
Manual Calculation Example: Goodness-of-Fit (Fair Dice Roll)
Let's walk through calculating the χ² statistic by hand. This helps solidify understanding before relying on software.
Scenario
We roll a standard six-sided die 120 times to test if it's fair. A fair die means each face (1 to 6) should appear with equal probability (1/6).
Null Hypothesis (H₀): The die is fair (P(1)=P(2)=...=P(6)=1/6).
Alternative Hypothesis (H₁): The die is not fair (the probabilities are different).
Observed Data (O): Suppose we observed the following counts after 120 rolls:
- Face 1: 18 times
- Face 2: 25 times
- Face 3: 15 times
- Face 4: 22 times
- Face 5: 28 times
- Face 6: 12 times
- (Total Rolls = 18+25+15+22+28+12 = 120)
Step-by-Step Calculation
- Calculate Expected Frequencies (E): If the die is fair (H₀ is true), we expect each face to appear:
Total Rolls * P(Face) = 120 * (1/6) = 20 times. So, E = 20 for all faces. - Set up the Calculation Table:
- Calculate the χ² Statistic: Sum the values in the last column:
χ² = 0.20 + 1.25 + 1.25 + 0.20 + 3.20 + 3.20 = 9.30 - Determine Degrees of Freedom (df): For a Goodness-of-Fit test, df = k - 1, where k is the number of categories (faces).
df = 6 - 1 = 5
Face | Observed (O) | Expected (E) | O - E | (O - E)² | (O - E)² / E |
---|---|---|---|---|---|
1 | 18 | 20 | -2 | 4 | 4 / 20 = 0.20 |
2 | 25 | 20 | 5 | 25 | 25 / 20 = 1.25 |
3 | 15 | 20 | -5 | 25 | 25 / 20 = 1.25 |
4 | 22 | 20 | 2 | 4 | 4 / 20 = 0.20 |
5 | 28 | 20 | 8 | 64 | 64 / 20 = 3.20 |
6 | 12 | 20 | -8 | 64 | 64 / 20 = 3.20 |
Total (Sum of last column) | χ² = 9.30 |
Next Step (Interpretation)
We would compare our calculated χ² value (9.30) with a critical value from the Chi-Square distribution table using df=5 and a chosen significance level (e.g., α = 0.05). The critical value for χ²(df=5, α=0.05) is approximately 11.07. Since our calculated value (9.30) is less than the critical value (11.07), we fail to reject H₀. Alternatively, the p-value associated with χ²=9.30 and df=5 is approximately 0.097, which is greater than 0.05. Conclusion: Based on this data, there is not enough statistical evidence to conclude that the die is unfair.
Interactive Chi-Square Distribution Explorer
The tool below allows you to explore the Chi‑Square distribution interactively. Understanding the shape of this distribution and how it changes with degrees of freedom (df) is key to interpreting test results and p-values. Adjust the df and critical value to see their impact.
The Chi-Square (χ²) distribution is fundamental for goodness-of-fit tests and tests of independence. Its shape is defined by the degrees of freedom (df). The distribution is always non-negative and right-skewed, especially for low df. As df increases, the curve spreads out and becomes more symmetrical, resembling a normal distribution.
Use the slider to adjust the degrees of freedom (df) and see how the curve changes. Input a Critical Chi-Square Value (often found from tables or software based on your significance level α) to visualize the corresponding p-value (the area shaded to the right). Hover over the graph to see the probability density (PDF) for specific χ² values. The distribution's mean (df) and mode (df-2, for df>2) are also marked.
Usage of Chi-Square Tests
Chi-Square tests are versatile statistical methods for analyzing categorical data. Their main applications include:
- Goodness-of-Fit Test: Tests if the observed frequency distribution of a single categorical variable matches a specific, expected distribution. For example, are M&M colors distributed according to the company's claimed percentages, or is a die fair?
- Test of Independence: Tests whether two categorical variables collected from a single sample are associated or independent. For example, is there a relationship between smoking status and lung cancer incidence in a population sample?
- Test of Homogeneity: Compares the distribution of a categorical variable across two or more different populations or groups to see if the distributions are the same. For example, do different teaching methods lead to the same distribution of student grades (A, B, C, etc.)?
Types of Chi-Square Tests
The two primary types of Pearson’s chi-square (χ²) tests commonly encountered are:
- χ² Goodness-of-Fit Test: Used when you have one categorical variable from a single population. It assesses whether the observed frequencies in each category significantly differ from the frequencies you would expect based on a specific hypothesis (e.g., equal proportions, proportions matching a known distribution). The degrees of freedom (df) are calculated as `k - 1`, where `k` is the number of categories.
- χ² Test of Independence: Used when you have two categorical variables from a single population sample, usually presented in a contingency table. It evaluates whether there is a statistically significant association (dependency) between the two variables. The degrees of freedom (df) are calculated as `(rows - 1) * (columns - 1)`.
Note: The chi-square test of homogeneity, while conceptually distinct (comparing distributions across multiple populations), uses the same calculation method and degrees of freedom formula as the test of independence.
Example 1: Zodiac Signs (Goodness-of-Fit)
Scenario
A researcher wants to know if the birthdays of 256 executives are evenly distributed throughout the year (i.e., across the 12 zodiac signs). If evenly distributed, we'd expect an equal number of executives for each sign.
Null Hypothesis (H₀): The distribution of executive birthdays across zodiac signs is uniform (equal proportions).
Alternative Hypothesis (H₁): The distribution is not uniform.
Data:
Sign | Observed (O) | Expected (E) | (O-E)²/E |
---|---|---|---|
Aries | 23 | 21.33 | 0.131 |
Taurus | 20 | 21.33 | 0.082 |
Gemini | 18 | 21.33 | 0.521 |
Cancer | 23 | 21.33 | 0.131 |
Leo | 21 | 21.33 | 0.005 |
Virgo | 19 | 21.33 | 0.254 |
Libra | 18 | 21.33 | 0.521 |
Scorpio | 21 | 21.33 | 0.005 |
Sagittarius | 19 | 21.33 | 0.254 |
Capricorn | 22 | 21.33 | 0.021 |
Aquarius | 24 | 21.33 | 0.328 |
Pisces | 29 | 21.33 | 2.766 |
Total | χ² ≈ 5.019 |
Expected Count per sign: Total executives / Number of signs = 256 / 12 ≈ 21.33
Degrees of Freedom (df): Number of categories - 1 = 12 - 1 = 11
R Code (Example): Assuming `observed_counts` is a vector `c(23, 20, ..., 29)`:
observed_counts <- c(23, 20, 18, 23, 21, 19, 18, 21, 19, 22, 24, 29) # Expected probabilities are equal (1/12 for each) chisq_result_zodiac <- chisq.test(observed_counts) print(chisq_result_zodiac)
Result Interpretation: The calculated χ² ≈ 5.019. With df=11, the p-value reported by R is large (p ≈ 0.93). Since p > 0.05, we fail to reject H₀; there's no significant statistical evidence from this sample to suggest that executive birthdays are unevenly distributed across zodiac signs.
Test Function Reminder: chisq.test(observed_vector_count, p = expected_probability_vector)
(p
defaults to equal probabilities if omitted).
Example 2: Absenteeism (Goodness-of-Fit)
Scenario
Faculty hypothesized absenteeism rates among 100 students. They compared their expectations to actual survey data.
H₀: The observed student absenteeism matches the expected distribution (50% in 0-2, 30% in 3-5, 12% in 6-8, 8% in 9+).
H₁: The observed distribution differs from the expected one.
Data:
Number of Absences | Expected Students (E) | Observed Students (O) | (O-E)²/E |
---|---|---|---|
0–2 | 50 | 35 | 4.500 |
3–5 | 30 | 40 | 3.333 |
6–8 | 12 | 20 | 5.333 |
9+ | 8 | 5 | 1.125 |
Total | χ² = 14.291 |
Degrees of Freedom (df): k - 1 = 4 - 1 = 3
R Code (Example):
observed_absences <- c(35, 40, 20, 5) expected_proportions_abs <- c(0.50, 0.30, 0.12, 0.08) # Use proportions chisq_result_abs <- chisq.test(observed_absences, p = expected_proportions_abs) print(chisq_result_abs)
Result Interpretation: The calculated χ² ≈ 14.29. With df=3, the p-value reported by R is small (p ≈ 0.0025). Since p < 0.05, we reject H₀. There is significant statistical evidence that the observed student absenteeism distribution differs from what the faculty expected.
Example 3: Televisions (Goodness-of-Fit)
Scenario
We test if the number of TVs per family in a sample of 600 families matches the claimed national distribution percentages.
H₀: The observed TV distribution matches the national percentages (10% 0, 16% 1, 55% 2, 11% 3, 8% 4+).
H₁: The observed distribution does not match.
Data:
Number of TVs | National % | Expected (E) (out of 600) | Observed (O) | (O-E)²/E |
---|---|---|---|---|
0 | 10% | 60 | 66 | 0.600 |
1 | 16% | 96 | 119 | 5.510 |
2 | 55% | 330 | 340 | 0.303 |
3 | 11% | 66 | 60 | 0.545 |
4+ | 8% | 48 | 15 | 22.688 |
Total | χ² ≈ 29.646 |
Degrees of Freedom (df): k - 1 = 5 - 1 = 4
R Code (Example):
observed_tv <- c(66, 119, 340, 60, 15) expected_proportions_tv <- c(0.10, 0.16, 0.55, 0.11, 0.08) chisq_result_tv <- chisq.test(observed_tv, p = expected_proportions_tv) print(chisq_result_tv)
Result Interpretation: The calculated χ² ≈ 29.65. With df=4, the p-value reported by R is very small (p ≈ 5.7e-6). Since p < 0.05, we reject H₀. There is strong statistical evidence that the distribution of televisions in this sample significantly differs from the claimed national distribution, with the largest discrepancy in the '4+' category.
Class Activity 1: Customer Count (Goodness-of-Fit)
Scenario
A shop owner claims customer traffic is the same every weekday. A researcher recorded counts over a week to test this claim.
H₀: Customer counts are uniformly distributed across the 5 weekdays (Mon-Fri).
H₁: Customer counts are not uniformly distributed.
Data: Mon: 50, Tue: 60, Wed: 40, Thu: 47, Fri: 53. Total Customers = 250.
Expected Count per day (if H₀ is true): Total Customers / Number of days = 250 / 5 = 50
R command to test the hypothesis:
customer_counts <- c(50, 60, 40, 47, 53) # Test against equal proportions (default for chisq.test when p is omitted) chisq_result_cust <- chisq.test(customer_counts) print(chisq_result_cust) # Explicitly with proportions: chisq.test(customer_counts, p = rep(1/5, 5))
Example Output: X-squared = 4.36, df = 4, p-value = 0.3595
Conclusion: With a p-value of 0.36 (which is > 0.05), we fail to reject H₀. Based on this sample, there isn't enough statistical evidence to conclude that the customer flow significantly differs across weekdays from what would be expected if it were uniform.
Example: M&M Color Breakdown (Goodness-of-Fit)
Scenario
Test if a sample of 712 M&Ms collected in 2016-2017 matches the color distribution percentages published by Mars in 2008.
H₀: The 2016-2017 sample distribution matches the 2008 published percentages (Blue: 24%, Orange: 20%, Green: 16%, Yellow: 14%, Red: 13%, Brown: 13%).
H₁: The distributions differ.
Data:
- Published (Expected %): Blue: 24%, Orange: 20%, Green: 16%, Yellow: 14%, Red: 13%, Brown: 13%
- Observed Sample Counts (n=712): Blue: 133, Orange: 133, Green: 139, Yellow: 103, Red: 108, Brown: 96 (Total 712)
R code:
# Observed counts observed_mm = c(133, 133, 139, 103, 108, 96) # Expected proportions expected_prop_mm = c(0.24, 0.20, 0.16, 0.14, 0.13, 0.13) # Perform the test chisq_result_mm <- chisq.test(observed_mm, p = expected_prop_mm) print(chisq_result_mm)
Example Output: X-squared = 17.066, df = 5, p-value = 0.004377
Conclusion: Since the p-value (0.0044) is less than 0.05, we reject H₀. The color distribution in the 2016-2017 sample is significantly different from the proportions published by Mars in 2008.
Class Activity 2: Eye & Hair Color (Test of Independence)
Scenario
Analyze data on eye color and hair color from a sample of individuals to determine if these two traits are associated (dependent) or independent.
H₀: Eye color and hair color are independent.
H₁: Eye color and hair color are associated (dependent).
Data (Contingency Table - Example from `datasets::HairEyeColor` in R, summed over sex):
# Example using built-in R dataset data(HairEyeColor) hair_eye_table <- apply(HairEyeColor, c("Hair", "Eye"), sum) # Sum over Male/Female print("Observed Contingency Table (Hair vs Eye Color):") print(hair_eye_table)
R code for the chi-square test:
# Perform the test directly on the table test_result_he <- chisq.test(hair_eye_table) print(test_result_he) print("Expected Counts (under independence assumption):") print(round(test_result_he$expected, 2)) # Show expected counts rounded
Degrees of Freedom (df): (Number of Hair Colors - 1) * (Number of Eye Colors - 1) = (4 - 1) * (4 - 1) = 3 * 3 = 9
Example Output (using HairEyeColor data): X-squared = 138.29, df = 9, p-value < 2.2e-16
Conclusion: The p-value is extremely small (effectively zero). We strongly reject H₀. There is a highly significant statistical association between hair color and eye color in this dataset.
Example: Skittles Favorite Flavor (Test of Homogeneity)
Scenario
A survey asked people their favorite Skittle. One group was shown colors, another tasted blindfolded (flavors). We want to know if the distribution of favorite choices is the same for both polling methods (groups).
H₀: The distribution of favorite Skittle choices is the same for the Color Poll and Flavor Poll groups.
H₁: The distributions of favorite choices are different between the two groups.
Data (Contingency Table - Rows: Poll Type, Columns: Skittle Option):
Poll Type | Opt 1 | Opt 2 | Opt 3 | Opt 4 | Opt 5 | Row Totals |
---|---|---|---|---|---|---|
Color Poll | 18 | 9 | 15 | 13 | 11 | 66 |
Flavor Poll | 13 | 16 | 19 | 34 | 9 | 91 |
Col Totals | 31 | 25 | 34 | 47 | 20 | 157 |
R code:
# Create the matrix skittles_data = matrix(c(18, 9, 15, 13, 11, # Row 1: Color Poll 13, 16, 19, 34, 9), # Row 2: Flavor Poll nrow = 2, ncol = 5, byrow = TRUE) colnames(skittles_data) <- c("Opt1", "Opt2", "Opt3", "Opt4", "Opt5") rownames(skittles_data) <- c("ColorPoll", "FlavorPoll") print("Observed Data:") print(skittles_data) # Perform the test (Test of Independence/Homogeneity) chisq_result_skittles <- chisq.test(skittles_data) print(chisq_result_skittles)
Degrees of Freedom (df): (Rows - 1) * (Columns - 1) = (2 - 1) * (5 - 1) = 1 * 4 = 4
Example Output: X-squared = 9.0691, df = 4, p-value = 0.0594
Conclusion: The p-value (0.0594) is slightly above the common significance level of 0.05. Therefore, we fail to reject H₀. There isn't statistically significant evidence (at α=0.05) to conclude that the polling method (color vs. flavor) affects the distribution of favorite Skittle choices, although the result is borderline and might warrant further investigation with a larger sample.
Example: Rock-Paper-Scissors (1) - Goodness-of-Fit (Equal Proportions)
Scenario
Test if players choose Rock, Paper, or Scissors with equal frequency in a series of games.
H₀: The choices Rock, Paper, Scissors are made with equal probability (1/3 each).
H₁: The probabilities are not equal.
Data: Total plays = 88 + 74 + 66 = 228
- Rock: 88
- Paper: 74
- Scissors: 66
Expected (if H₀ is true): 228 / 3 = 76 for each choice.
R code:
rps_counts1 <- c(88, 74, 66) # Test against equal probabilities (default) chisq_result_rps1 <- chisq.test(rps_counts1) print(chisq_result_rps1) # Explicitly: chisq.test(rps_counts1, p = c(1/3, 1/3, 1/3))
Degrees of Freedom (df): k - 1 = 3 - 1 = 2
Example Output: X-squared = 3.2632, df = 2, p-value = 0.1956
Conclusion: The p-value (0.196) is greater than 0.05. We fail to reject H₀. There's no significant evidence from this data that players deviated from choosing Rock, Paper, or Scissors with equal frequency.
Example: Rock-Paper-Scissors (2) - Goodness-of-Fit (Unequal Proportions)
Scenario
Test if the observed frequencies of choices match a specific, *unequal* hypothesized distribution (e.g., a player claims they throw Rock 50% of the time, Paper 30%, and Scissors 20%).
H₀: The observed choices follow the distribution P(Rock)=0.5, P(Paper)=0.3, P(Scissors)=0.2.
H₁: The observed choices do not follow this specific distribution.
Data: Total plays = 66 + 39 + 14 = 119
- Rock: 66
- Paper: 39
- Scissors: 14
Expected (if H₀ is true): E(Rock)=119*0.5=59.5, E(Paper)=119*0.3=35.7, E(Scissors)=119*0.2=23.8
R code:
rps_observed2 <- c(66, 39, 14) rps_expected_prop2 <- c(0.5, 0.3, 0.2) chisq_result_rps2 <- chisq.test(rps_observed2, p = rps_expected_prop2) print(chisq_result_rps2)
Degrees of Freedom (df): k - 1 = 3 - 1 = 2
Example Output: X-squared = 5.0504, df = 2, p-value = 0.08004
Conclusion: The p-value (0.08) is greater than 0.05. We fail to reject H₀. While the observed counts aren't a perfect match to the expected ones (especially for Scissors), the difference is not statistically significant at the α=0.05 level. There isn't enough evidence to disprove the player's claimed strategy based on this data.
Finding Chi-Square with a Contingency Table (Manual Steps & R)
Scenario
Illustrate the calculation steps for a Test of Independence using a simple 2x2 contingency table and verify with R.
H₀: The row variable and column variable are independent.
H₁: The variables are dependent (associated).
Observed Data (Example 2x2 Table):
# Create the contingency table in R observed_table_cont <- matrix(c(60, 40, # Row 1 20, 10), # Row 2 nrow = 2, byrow = TRUE) colnames(observed_table_cont) <- c("Category A", "Category B") rownames(observed_table_cont) <- c("Group 1", "Group 2") print("Observed Table:") print(observed_table_cont)
Manual Calculation Steps:
- Calculate Row and Column Totals: Row1=100, Row2=30, ColA=80, ColB=50, GrandTotal=130
- Calculate Expected Frequencies (E) for each cell: E = (Row Total * Column Total) / Grand Total
- E(R1,C1)=(100*80)/130≈61.54
- E(R1,C2)=(100*50)/130≈38.46
- E(R2,C1)=(30*80)/130≈18.46
- E(R2,C2)=(30*50)/130≈11.54
- Calculate the Chi-Square component for each cell: (O - E)² / E
- Cell(1,1):(60-61.54)²/61.54≈0.0385
- Cell(1,2):(40-38.46)²/38.46≈0.0615
- Cell(2,1):(20-18.46)²/18.46≈0.1282
- Cell(2,2):(10-11.54)²/11.54≈0.2051
- Sum the components to get the Chi-Square statistic (χ²): χ² ≈ 0.0385+0.0615+0.1282+0.2051 ≈ 0.4333
- Determine Degrees of Freedom (df): df = (rows-1)*(cols-1) = (2-1)*(2-1) = 1
R Code Verification:
# Perform the chi-square test using R # correct = FALSE removes Yates' continuity correction for 2x2 tables, matching manual calculation more closely test_result_cont <- chisq.test(observed_table_cont, correct = FALSE) print(test_result_cont) print("Expected values from R:") print(round(test_result_cont$expected, 2)) # Rounded expected values
Example R Output:
Pearson's Chi-squared test data: observed_table_cont X-squared = 0.43333, df = 1, p-value = 0.5104 Expected values from R: Category A Category B Group 1 61.54 38.46 Group 2 18.46 11.54
Conclusion: The manually calculated χ² statistic (0.433) matches the R output (when `correct=FALSE`). With df=1, the p-value is 0.51. Since p > 0.05, we fail to reject H₀. There is no significant statistical evidence of an association between the row variable (Group) and the column variable (Category) in this data.
Chi-Square Tests: Formula Summary
Here's a quick reference table for the key formulas used in Chi-Square testing:
Concept | Formula | Notes / Applies To |
---|---|---|
Chi-Square Statistic (χ²) | χ² = Σ [ (O - E)² / E ] | Applies to all Chi-Square tests (Goodness-of-Fit, Independence, Homogeneity). Summation (Σ) is over all categories or cells. |
Expected Frequency (E) | E = n * p | Goodness-of-Fit Test. n = total sample size, p = expected proportion for the category under H₀. |
Expected Frequency (E) | E = (Row Total * Column Total) / Grand Total | Test of Independence / Test of Homogeneity. Calculated for each cell in the contingency table. |
Degrees of Freedom (df) | df = k - 1 | Goodness-of-Fit Test. k = number of categories. |
Degrees of Freedom (df) | df = (Rows - 1) * (Columns - 1) | Test of Independence / Test of Homogeneity. Based on the dimensions of the contingency table. |