After completing this chapter, you should be able to:
Understand the F-distribution and its relationship to ANOVA
Perform and interpret one-way ANOVA
Check and validate ANOVA assumptions
Conduct post-hoc analyses when appropriate
Choose between parametric and non-parametric methods
Lecture Notes: Chapter 16 – Analysis of Variance (ANOVA)
Introduction to ANOVA
Analysis of Variance (ANOVA) is a statistical method used to compare the means of three or more groups simultaneously. While a t-test can compare two means, performing multiple t-tests to compare many groups increases the risk of making a Type I error (incorrectly rejecting the null hypothesis). ANOVA uses an F-test to determine if there is a significant difference between the means of the groups being compared by examining the ratio of variability between groups to variability within groups.
Section 1: The F-Distribution
The F-distribution is a probability distribution used in ANOVA and other statistical tests. Use the sliders below to explore how different degrees of freedom affect the shape of the distribution:
Quick R Commands for F-distribution Analysis
# Critical F-value (right-tailed)
qf(p = 0.95, df1 = 2, df2 = 6) # α = 0.05
# P-value from F-statistic
1 - pf(1.69, df1 = 2, df2 = 6) # Right-tailed test
# F-distribution density at specific point
df(x = 1.69, df1 = 2, df2 = 6)
# Plotting F-distribution
curve(df(x, df1 = 2, df2 = 6), from = 0, to = 6,
main = "F-distribution (df1=2, df2=6)")
abline(v = qf(0.95, 2, 6), col = "red", lty = 2) # Add critical value
Key Functions:
qf() - Find critical F-value
pf() - Find cumulative probability
df() - F-distribution density
curve() - Plot the distribution
Section 2: One-Way ANOVA Framework
One-Way ANOVA is used when you have one categorical independent variable (factor) with three or more levels (groups) and one quantitative dependent variable. The test determines if there is a statistically significant difference between the means of these groups.
Assumptions for One-Way ANOVA:
For the results of a one-way ANOVA to be valid, the following assumptions should ideally be met:
Independence: The samples are simple random samples, and observations within and between groups are independent. This is crucial and often depends on the study design.
Normality: The population from which each group sample is drawn is approximately normally distributed. This assumption is less critical with larger sample sizes due to the Central Limit Theorem.
Homogeneity of Variances: The populations from which the samples are drawn have equal variances (or equal standard deviations). This is also known as homoscedasticity. This can be formally tested using tests like Levene's test. If this assumption is violated, alternative procedures like Welch's ANOVA or nonparametric tests may be more appropriate.
ANOVA Identity:
The total variation in the data (SST) can be partitioned into the variation between groups (SSTR) and the variation within groups (SSE). The fundamental equation of ANOVA is:
\[SST = SSTR + SSE\]
\(SST\): Total Sum of Squares - measures the total variability in the data.
\(SSTR\): Sum of Squares due to Treatment (or Between-Groups) - measures the variability between the means of the different groups.
\(SSE\): Sum of Squares due to Error (or Within-Groups) - measures the variability within each group. This represents the random error.
Section 3: ANOVA Procedure Step-by-Step
The procedure involves calculating the sum of squares, degrees of freedom, mean squares, and finally the F-statistic, which are typically summarized in an ANOVA table.
ANOVA Table Format:
Source of Variation
df
SS
MS = SS/df
F
Between Groups (Treatment)
k - 1
SSTR
MSTR = SSTR/(k-1)
MSTR/MSE
Within Groups (Error)
N - k
SSE
MSE = SSE/(N-k)
Total
N - 1
SST
—
—
Where:
k = number of groups
N = total sample size
SSTR = between-groups sum of squares
SSE = within-groups sum of squares
SST = total sum of squares
MSTR = between-groups mean square
MSE = within-groups mean square (error)
Section 4: Multiple Comparisons
If the One-Way ANOVA test results in rejecting the null hypothesis \(H_0\) (i.e., you conclude that at least one group mean is different), you typically want to know *which* specific pairs of group means are different. Performing simple t-tests between all pairs of groups increases the family-wise error rate (the probability of making at least one Type I error among all comparisons). Multiple comparison procedures are designed to control this error rate.
Tukey's Honestly Significant Difference (HSD) test is a common method for pairwise comparisons when you have equal (or nearly equal) sample sizes per group and the equal variance assumption holds. It is based on the studentized range distribution.
Tukey’s Procedure Steps:
Choose a family confidence level \(1 - \alpha\). This is the probability that all confidence intervals constructed for all pairwise comparisons contain the true difference. The typical \(\alpha\) from the ANOVA is often used here.
Find the studentized range critical value \(q_{\alpha,\,k,\,n-k}\) from the studentized range table (or using software) with \(\alpha\), number of groups \(k\), and error degrees of freedom \(n-k\).
For each pair of groups \(i\) and \(j\) (where \(i \ne j\)), compute the confidence interval for the difference between their population means (\(\mu_i - \mu_j\)). The confidence interval is given by:
\[
(\bar{x}_i - \bar{x}_j) \pm \dfrac{q_{\alpha,\,k,\,n-k}}{\sqrt{2}}
\sqrt{MSE \left(\dfrac{1}{n_i} + \dfrac{1}{n_j}\right)}
\]
(Note: An equivalent and common form when sample sizes are equal, \(n_i = n_j = n'\), is \((\bar{x}_i - \bar{x}_j) \pm q_{\alpha,\,k,\,n-k} \sqrt{\dfrac{MSE}{n'}}\). The first formula is more general for unequal \(n_i\).)
If the confidence interval for a pair of means does not contain 0, you declare that the population means \(\mu_i\) and \(\mu_j\) are significantly different at the chosen family confidence level. If the interval contains 0, you do not have sufficient evidence to conclude they are different.
Section 5: The Kruskal–Wallis Test
The Kruskal–Wallis H test is a nonparametric alternative to the One-Way ANOVA. It is used when the assumptions for ANOVA, particularly normality or homogeneity of variances, are not met. It tests whether samples originate from the same distribution, which is often interpreted as testing for differences in medians or locations when the shape of the distributions is similar across groups.
Assumptions for the Kruskal–Wallis Test:
The samples are independent random samples.
The distributions of the variable in the populations have the same shape (though they may differ in location/median).
Sample sizes for each group are generally recommended to be 5 or more for the chi-square approximation to be reliable.
Test statistic:
The test involves ranking all observations from all groups together. The Kruskal-Wallis test statistic, \(H\) (often denoted \(K\)), is calculated based on the sum of the ranks for each group:
\[
H = \frac{12}{n(n+1)}
\sum_{i=1}^k \frac{R_i^2}{n_i}
- 3(n+1)
\]
Where \(k\) is the number of groups, \(n_i\) is the sample size of the \(i\)-th group, \(R_i\) is the sum of the ranks for the \(i\)-th group, and \(n = \sum n_i\) is the total sample size.
For reasonably large sample sizes (\(n_i \ge 5\)), the distribution of the \(H\) statistic can be approximated by a chi-squared distribution (\(\chi^2\)) with \(k-1\) degrees of freedom.
Procedure using the Kruskal–Wallis Test:
State hypotheses:
\(H_0:\) The distributions of the variable are the same for all groups.
\(H_a:\) At least one group's distribution is different from the others (often interpreted as different medians/locations).
Select significance level \(\alpha\).
Combine all data and rank them from smallest (1) to largest (n). If there are ties, assign the average rank to the tied observations.
Calculate the sum of ranks (\(R_i\)) for each group.
Compute the \(H\) test statistic using the formula above.
Find the critical value \(\chi^2_{\alpha,\,k-1}\) from the chi-squared distribution table (or using software) with \(k-1\) degrees of freedom. Or, compute the p-value associated with the calculated \(H\) statistic.
Compare \(H\) to the critical value (or compare p-value to \(\alpha\)) and conclude. If \(H > \chi^2_{\alpha,\,k-1}\) (or p-value \(< \alpha\)), reject \(H_0\).
Example: Kruskal-Wallis Test
Question: Analyze whether there are differences in patient recovery times (days) across three treatment methods when the data appears non-normal:
Treatment A: 5, 7, 6, 8, 4
Treatment B: 12, 9, 11, 8, 10
Treatment C: 6, 8, 7, 9, 5
# Basic Kruskal-Wallis test in R
treat_A <- c(5, 7, 6, 8, 4)
treat_B <- c(12, 9, 11, 8, 10)
treat_C <- c(6, 8, 7, 9, 5)
recovery <- c(treat_A, treat_B, treat_C)
groups <- factor(rep(c("A", "B", "C"), each=5))
# Perform test
kruskal.test(recovery ~ groups)
# Basic boxplot
boxplot(recovery ~ groups)
Section 6: Real-World Examples with R
Example 1: Teaching Methods Comparison
Comparing test scores across three teaching methods: Traditional Lecture, Online Modules, and Peer-to-Peer Learning.
# Create the dataset
x1 <- c(75, 82, 68) # Traditional Lecture
x2 <- c(90, 78, 85) # Online Modules
x3 <- c(88, 92, 80) # Peer-to-Peer
X <- c(x1, x2, x3)
Method <- rep(c("Traditional", "Online", "PeertoPeer"), each=3)
scores_data <- data.frame(Score=X, Method=Method)
# Perform One-Way ANOVA
model <- aov(Score ~ Method, data=scores_data)
summary(model)
# Visualize the data using base R
boxplot(Score ~ Method, data=scores_data,
col=c("#E8EAF6", "#C5CAE9", "#9FA8DA"),
main="Test Scores by Teaching Method",
ylab="Test Score", xlab="Teaching Method")
# Add individual points
stripchart(Score ~ Method, data=scores_data,
vertical=TRUE, method="jitter",
pch=19, col="#1A237E", add=TRUE)
Start → Are there outliers? → Yes → Consider robust methods or nonparametric tests
→ No → Check normality
→ Normal → Check variance equality
→ Equal → Use One-Way ANOVA
→ Unequal → Use Welch's ANOVA
→ Not Normal → Sample size > 30?
→ Yes → Use One-Way ANOVA
→ No → Use Kruskal-Wallis
Worked Example: One-Way ANOVA
Let's analyze test scores from three different teaching methods (Group A, B, C) using One-Way ANOVA. We have \(k=3\) groups.
Determine Critical Value or p-value:
Using \(\alpha = 0.05\), the critical value \(F_{0.05,\,2,\,6}\) is looked up from an F-table or found using software.
Critical Value: \(F_{0.05,\,2,\,6} \approx 5.14\).
(Using software, the p-value for F=1.69 with df1=2 and df2=6 is approximately 0.26.)
Compare and Conclude:
Comparing the calculated F-statistic (1.69) to the critical value (5.14):
Since \(1.69 < 5.14\), we fail to reject \(H_0\).
(Or, comparing the p-value (0.26) to \(\alpha=0.05\): Since \(0.26 > 0.05\), we fail to reject \(H_0\).)
Interpretation:
At the \(\alpha=0.05\) significance level, there is not enough statistically significant evidence to conclude that the means of the test scores for the three teaching methods are different.
Practice Problems with R
Problem 1: Plant Growth by Fertilizer Type
Researchers tested the effectiveness of three fertilizers (A, B, and C) on plant growth. The plant heights (in cm) after 30 days are:
Fertilizer A: 22, 25, 24, 27
Fertilizer B: 31, 30, 29, 32
Fertilizer C: 20, 21, 19, 22
# Create the dataset
A <- c(22, 25, 24, 27)
B <- c(31, 30, 29, 32)
C <- c(20, 21, 19, 22)
group <- factor(rep(c("A", "B", "C"), each=4))
height <- c(A, B, C)
data <- data.frame(group, height)
# Perform One-Way ANOVA
result <- aov(height ~ group, data=data)
summary(result)
# Visualize with boxplot
boxplot(height ~ group, data=data,
col=c("#E8EAF6", "#C5CAE9", "#9FA8DA"),
main="Plant Height by Fertilizer Type",
ylab="Height (cm)", xlab="Fertilizer Type")
stripchart(height ~ group, data=data,
vertical=TRUE, method="jitter",
pch=19, col="#1A237E", add=TRUE)
# Post-hoc analysis
TukeyHSD(result)
Solution Analysis:
H₀: All fertilizer types have the same mean effect on plant height
H₁: At least one fertilizer type has a different mean effect
α = 0.05
Check assumptions:
Independence: Satisfied by experimental design
Normality: Can be checked with qq-plot
Equal variance: Can be checked with Levene's test
Problem 2: Effect of Study Methods on Exam Scores
Students were randomly assigned to three study methods. Their exam scores are:
Question: How many degrees of freedom does an F-distribution have? What are those degrees of freedom called?
Options:
A. An F-distribution has three degrees of freedom: numerator, denominator, and F-statistic
B. An F-distribution has two degrees of freedom: numerator and denominator
C. An F-distribution has two degrees of freedom: denominator and numerator
D. An F-distribution has one degree of freedom: F-statistic
Correct Answer: B
Explanation: The F-distribution has exactly two degrees of freedom:
Numerator df (df₁): Comes from the between-groups variation
Denominator df (df₂): Comes from the within-groups variation
In ANOVA, df₁ = k-1 (where k is number of groups) and df₂ = N-k (where N is total sample size).
Q2: Identifying df from Notation
Question: An F-curve has df = (1,14)
a) Numerator df = 1
b) Denominator df = 14
Correct Answer: Both statements are true
Explanation: In F-distribution notation:
The first number always represents the numerator df (df₁)
The second number always represents the denominator df (df₂)
This notation is consistent across statistical software and tables
Q3: Using F-distribution Tables
Question: An F-curve has degrees of freedom (10,15). Use an F-distribution table to find the F-value that has an area of 0.05 to its right.
# Using R to find the critical value
qf(0.95, df1=10, df2=15) # α = 0.05 for right-tailed test
[1] 2.54
# Verify using cumulative probability
1 - pf(2.54, df1=10, df2=15)
[1] 0.05
Explanation:
For α = 0.05 (right-tailed), we need the 95th percentile (1 - α)
The critical value F₀.₀₅,₁₀,₁₅ ≈ 2.54
Any F-statistic larger than 2.54 would fall in the rejection region
Q4: ANOVA Relationship
Question: One-way ANOVA is a procedure for comparing the means of several populations. It generalizes which of the following procedures?
Options:
A. Mann-Whitney test
B. Pooled t-procedure
C. Paired t-test
D. Two-means z-test
Correct Answer: B
Explanation:
ANOVA is an extension of the pooled t-procedure to more than two groups
When ANOVA is used with exactly two groups, the F-statistic equals the square of the t-statistic
The relationship: F = t² when comparing exactly two groups
Both procedures assume independent samples and equal variances
Q5: Degrees of Freedom in ANOVA
Question: Suppose a one-way ANOVA is performed to compare the means of 5 populations, with sample sizes: 15, 17, 14, 18, and 12.
# Calculate degrees of freedom
k <- 5 # number of groups
N <- 15 + 17 + 14 + 18 + 12 # total sample size
df1 <- k - 1 # between-groups df
df2 <- N - k # within-groups df
c(df1, df2)
[1] 4 71