Learning Objectives

After completing this chapter, you should be able to:

Understand the F-distribution and its relationship to ANOVA
Perform and interpret one-way ANOVA
Check and validate ANOVA assumptions
Conduct post-hoc analyses when appropriate
Choose between parametric and non-parametric methods

Lecture Notes: Chapter 16 – Analysis of Variance (ANOVA)

Introduction to ANOVA

Analysis of Variance (ANOVA) is a statistical method used to compare the means of three or more groups simultaneously. While a t-test can compare two means, performing multiple t-tests to compare many groups increases the risk of making a Type I error (incorrectly rejecting the null hypothesis). ANOVA uses an F-test to determine if there is a significant difference between the means of the groups being compared by examining the ratio of variability between groups to variability within groups.

Section 1: The F-Distribution

The F-distribution is a probability distribution used in ANOVA and other statistical tests. Use the sliders below to explore how different degrees of freedom affect the shape of the distribution:

Numerator df (df₁): 2 Denominator df (df₂): 20

Quick R Commands for F-distribution Analysis

# Critical F-value (right-tailed)
qf(p = 0.95, df1 = 2, df2 = 6)  # α = 0.05

# P-value from F-statistic
1 - pf(1.69, df1 = 2, df2 = 6)  # Right-tailed test

# F-distribution density at specific point
df(x = 1.69, df1 = 2, df2 = 6)

# Plotting F-distribution
curve(df(x, df1 = 2, df2 = 6), from = 0, to = 6, 
      main = "F-distribution (df1=2, df2=6)")
abline(v = qf(0.95, 2, 6), col = "red", lty = 2)  # Add critical value
    

Key Functions:

qf() - Find critical F-value
pf() - Find cumulative probability
df() - F-distribution density
curve() - Plot the distribution

Section 2: One-Way ANOVA Framework

One-Way ANOVA is used when you have one categorical independent variable (factor) with three or more levels (groups) and one quantitative dependent variable. The test determines if there is a statistically significant difference between the means of these groups.

Assumptions for One-Way ANOVA:

For the results of a one-way ANOVA to be valid, the following assumptions should ideally be met:

Independence: The samples are simple random samples, and observations within and between groups are independent. This is crucial and often depends on the study design.
Normality: The population from which each group sample is drawn is approximately normally distributed. This assumption is less critical with larger sample sizes due to the Central Limit Theorem.
Homogeneity of Variances: The populations from which the samples are drawn have equal variances (or equal standard deviations). This is also known as homoscedasticity. This can be formally tested using tests like Levene's test. If this assumption is violated, alternative procedures like Welch's ANOVA or nonparametric tests may be more appropriate.

ANOVA Identity:

The total variation in the data (SST) can be partitioned into the variation between groups (SSTR) and the variation within groups (SSE). The fundamental equation of ANOVA is:

\[SST = SSTR + SSE\]

$SST$: Total Sum of Squares - measures the total variability in the data.
$SSTR$: Sum of Squares due to Treatment (or Between-Groups) - measures the variability between the means of the different groups.
$SSE$: Sum of Squares due to Error (or Within-Groups) - measures the variability within each group. This represents the random error.

Section 3: ANOVA Procedure Step-by-Step

The procedure involves calculating the sum of squares, degrees of freedom, mean squares, and finally the F-statistic, which are typically summarized in an ANOVA table.

ANOVA Table Format:

Source of Variation	df	SS	MS = SS/df	F
Between Groups (Treatment)	k - 1	SSTR	MSTR = SSTR/(k-1)	MSTR/MSE
Within Groups (Error)	N - k	SSE	MSE = SSE/(N-k)	MSTR/MSE
Total	N - 1	SST	—	—

Where:

k = number of groups
N = total sample size
SSTR = between-groups sum of squares
SSE = within-groups sum of squares
SST = total sum of squares
MSTR = between-groups mean square
MSE = within-groups mean square (error)

Section 4: Multiple Comparisons

If the One-Way ANOVA test results in rejecting the null hypothesis $H_0$ (i.e., you conclude that at least one group mean is different), you typically want to know *which* specific pairs of group means are different. Performing simple t-tests between all pairs of groups increases the family-wise error rate (the probability of making at least one Type I error among all comparisons). Multiple comparison procedures are designed to control this error rate.

Tukey's Honestly Significant Difference (HSD) test is a common method for pairwise comparisons when you have equal (or nearly equal) sample sizes per group and the equal variance assumption holds. It is based on the studentized range distribution.

Tukey’s Procedure Steps:

Choose a family confidence level $1 - \alpha$. This is the probability that all confidence intervals constructed for all pairwise comparisons contain the true difference. The typical $\alpha$ from the ANOVA is often used here.
Find the studentized range critical value $q_{\alpha,\,k,\,n-k}$ from the studentized range table (or using software) with $\alpha$, number of groups $k$, and error degrees of freedom $n-k$.
For each pair of groups $i$ and $j$ (where $i \ne j$), compute the confidence interval for the difference between their population means ($\mu_i - \mu_j$). The confidence interval is given by:
\[ (\bar{x}_i - \bar{x}_j) \pm \dfrac{q_{\alpha,\,k,\,n-k}}{\sqrt{2}} \sqrt{MSE \left(\dfrac{1}{n_i} + \dfrac{1}{n_j}\right)} \] (Note: An equivalent and common form when sample sizes are equal, $n_i = n_j = n'$, is $(\bar{x}_i - \bar{x}_j) \pm q_{\alpha,\,k,\,n-k} \sqrt{\dfrac{MSE}{n'}}$. The first formula is more general for unequal $n_i$.)
If the confidence interval for a pair of means does not contain 0, you declare that the population means $\mu_i$ and $\mu_j$ are significantly different at the chosen family confidence level. If the interval contains 0, you do not have sufficient evidence to conclude they are different.

Section 5: The Kruskal–Wallis Test

The Kruskal–Wallis H test is a nonparametric alternative to the One-Way ANOVA. It is used when the assumptions for ANOVA, particularly normality or homogeneity of variances, are not met. It tests whether samples originate from the same distribution, which is often interpreted as testing for differences in medians or locations when the shape of the distributions is similar across groups.

Assumptions for the Kruskal–Wallis Test:

The samples are independent random samples.
The distributions of the variable in the populations have the same shape (though they may differ in location/median).
Sample sizes for each group are generally recommended to be 5 or more for the chi-square approximation to be reliable.

Test statistic:

The test involves ranking all observations from all groups together. The Kruskal-Wallis test statistic, $H$ (often denoted $K$), is calculated based on the sum of the ranks for each group:

\[ H = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1) \]

Where $k$ is the number of groups, $n_i$ is the sample size of the $i$-th group, $R_i$ is the sum of the ranks for the $i$-th group, and $n = \sum n_i$ is the total sample size.

For reasonably large sample sizes ($n_i \ge 5$), the distribution of the $H$ statistic can be approximated by a chi-squared distribution ($\chi^2$) with $k-1$ degrees of freedom.

Procedure using the Kruskal–Wallis Test:

State hypotheses:
$H_0:$ The distributions of the variable are the same for all groups.
$H_a:$ At least one group's distribution is different from the others (often interpreted as different medians/locations).
Select significance level $\alpha$.
Combine all data and rank them from smallest (1) to largest (n). If there are ties, assign the average rank to the tied observations.
Calculate the sum of ranks ($R_i$) for each group.
Compute the $H$ test statistic using the formula above.
Find the critical value $\chi^2_{\alpha,\,k-1}$ from the chi-squared distribution table (or using software) with $k-1$ degrees of freedom. Or, compute the p-value associated with the calculated $H$ statistic.
Compare $H$ to the critical value (or compare p-value to $\alpha$) and conclude. If $H > \chi^2_{\alpha,\,k-1}$ (or p-value $< \alpha$), reject $H_0$.

Example: Kruskal-Wallis Test

Question: Analyze whether there are differences in patient recovery times (days) across three treatment methods when the data appears non-normal:

Treatment A: 5, 7, 6, 8, 4
Treatment B: 12, 9, 11, 8, 10
Treatment C: 6, 8, 7, 9, 5

# Basic Kruskal-Wallis test in R
treat_A <- c(5, 7, 6, 8, 4)
treat_B <- c(12, 9, 11, 8, 10)
treat_C <- c(6, 8, 7, 9, 5)

recovery <- c(treat_A, treat_B, treat_C)
groups <- factor(rep(c("A", "B", "C"), each=5))

# Perform test
kruskal.test(recovery ~ groups)

# Basic boxplot
boxplot(recovery ~ groups)
    

Section 6: Real-World Examples with R

Example 1: Teaching Methods Comparison

Comparing test scores across three teaching methods: Traditional Lecture, Online Modules, and Peer-to-Peer Learning.

# Create the dataset
x1 <- c(75, 82, 68)  # Traditional Lecture
x2 <- c(90, 78, 85)  # Online Modules
x3 <- c(88, 92, 80)  # Peer-to-Peer
X <- c(x1, x2, x3)
Method <- rep(c("Traditional", "Online", "PeertoPeer"), each=3)
scores_data <- data.frame(Score=X, Method=Method)

# Perform One-Way ANOVA
model <- aov(Score ~ Method, data=scores_data)
summary(model)

# Visualize the data using base R
boxplot(Score ~ Method, data=scores_data,
        col=c("#E8EAF6", "#C5CAE9", "#9FA8DA"),
        main="Test Scores by Teaching Method",
        ylab="Test Score", xlab="Teaching Method")

# Add individual points
stripchart(Score ~ Method, data=scores_data,
           vertical=TRUE, method="jitter",
           pch=19, col="#1A237E", add=TRUE)
    

Example 2: Effect Size Visualization

# Calculate effect size (η²)
model_stats <- summary(model)
eta_squared <- model_stats[[1]]["Method", "Sum Sq"] / 
               sum(model_stats[[1]][, "Sum Sq"])

# Create bar plot of means with error bars
library(ggplot2)
ggplot(scores_data, aes(x=Method, y=Score, fill=Method)) +
  stat_summary(fun=mean, geom="bar") +
  stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) +
  scale_fill_brewer(palette="Blues") +
  labs(title=paste("Mean Scores by Method\nη² =", round(eta_squared, 3)),
       y="Test Score") +
  theme_minimal()
    

Effect Size Calculations

Eta-squared (η²):

\[\eta^2 = \frac{SS_{\text{between}}}{SS_{\text{total}}} = \frac{SSTR}{SST}\]

Cohen's f:

\[f = \sqrt{\frac{\eta^2}{1-\eta^2}}\]

Choosing the Right Analysis Method

Start → Are there outliers? → Yes → Consider robust methods or nonparametric tests
                           → No  → Check normality
                                     → Normal → Check variance equality
                                                 → Equal → Use One-Way ANOVA
                                                 → Unequal → Use Welch's ANOVA
                                     → Not Normal → Sample size > 30?
                                                    → Yes → Use One-Way ANOVA
                                                    → No → Use Kruskal-Wallis

Worked Example: One-Way ANOVA

Let's analyze test scores from three different teaching methods (Group A, B, C) using One-Way ANOVA. We have $k=3$ groups.

Data (Test Scores):

Group A: 85, 90, 80 ($n_A=3$)
Group B: 88, 92, 84 ($n_B=3$)
Group C: 78, 85, 82 ($n_C=3$)

Total sample size \(n = n_A + n_B + n_C = 3 + 3 + 3 = 9\.

Hypotheses: \[H_0: \mu_A=\mu_B=\mu_C,\quad H_a:\text{at least one mean differs}\]
Calculate Sample Means and Grand Mean: $\bar{x}_A = (85+90+80)/3 = 85$ $\bar{x}_B = (88+92+84)/3 = 88$ $\bar{x}_C = (78+85+82)/3 = 81.67$ Grand Mean $\bar{x} = (85+88+81.67) \times 3 / 9 = (255 + 264 + 245.01)/9 = 764.01/9 \approx 84.89$
Calculate Sum of Squares:
Between-Groups SS (SSTR): \[ \text{SSTR} = \sum n_i(\bar{x}_i-\bar{x})^2 \] \[ \text{SSTR} = 3(85 - 84.89)^2 + 3(88 - 84.89)^2 + 3(81.67 - 84.89)^2 \] \[ \text{SSTR} = 3(0.11)^2 + 3(3.11)^2 + 3(-3.22)^2 \] \[ \text{SSTR} = 3(0.0121) + 3(9.6721) + 3(10.3684) = 0.0363 + 29.0163 + 31.1052 \approx 60.16 \]
Within-Groups SS (SSE): \[ \text{SSE} = \sum_{i=1}^k \sum_{j=1}^{n_i} (x_{ij}-\bar{x}_i)^2 \] \[ \text{SSE}_{\text{Group A}} = (85-85)^2 + (90-85)^2 + (80-85)^2 = 0^2 + 5^2 + (-5)^2 = 0 + 25 + 25 = 50 \] \[ \text{SSE}_{\text{Group B}} = (88-88)^2 + (92-88)^2 + (84-88)^2 = 0^2 + 4^2 + (-4)^2 = 0 + 16 + 16 = 32 \] \[ \text{SSE}_{\text{Group C}} = (78-81.67)^2 + (85-81.67)^2 + (82-81.67)^2 \approx (-3.67)^2 + (3.33)^2 + (0.33)^2 \approx 13.47 + 11.09 + 0.11 = 24.67 \] \[ \text{SSE} = 50 + 32 + 24.67 \approx 106.67 \]
Total SS (SST): \[ \text{SST} = \text{SSTR} + \text{SSE} = 60.16 + 106.67 = 166.83 \] (Or calculate directly: $\sum (x_{ij}-\bar{x})^2$)
Degrees of Freedom: df_Treatment = $k − 1 = 3 − 1 = 2$
df_Error = $n − k = 9 − 3 = 6$
df_Total = $n − 1 = 9 − 1 = 8$
(Check: df_Treatment + df_Error = 2 + 6 = 8 = df_Total)
Calculate Mean Squares: $MSTR = \dfrac{SSTR}{\text{df}_{\text{Treatment}}} = \dfrac{60.16}{2} \approx 30.08$
$MSE = \dfrac{SSE}{\text{df}_{\text{Error}}} = \dfrac{106.67}{6} \approx 17.78$
Calculate F-Statistic: \[F = \frac{MSTR}{MSE} = \frac{30.08}{17.78} \approx 1.69\]

ANOVA Table Summary:

Source	df	SS	MS	F
Treatment	2	60.16	30.08	1.69
Error	6	106.67	17.78	1.69
Total	8	166.83

Determine Critical Value or p-value: Using $\alpha = 0.05$, the critical value $F_{0.05,\,2,\,6}$ is looked up from an F-table or found using software.
Critical Value: $F_{0.05,\,2,\,6} \approx 5.14$.
(Using software, the p-value for F=1.69 with df1=2 and df2=6 is approximately 0.26.)
Compare and Conclude: Comparing the calculated F-statistic (1.69) to the critical value (5.14):
Since $1.69 < 5.14$, we fail to reject $H_0$.
(Or, comparing the p-value (0.26) to $\alpha=0.05$: Since $0.26 > 0.05$, we fail to reject $H_0$.)
Interpretation: At the $\alpha=0.05$ significance level, there is not enough statistically significant evidence to conclude that the means of the test scores for the three teaching methods are different.

Practice Problems with R

Problem 1: Plant Growth by Fertilizer Type

Researchers tested the effectiveness of three fertilizers (A, B, and C) on plant growth. The plant heights (in cm) after 30 days are:

Fertilizer A: 22, 25, 24, 27
Fertilizer B: 31, 30, 29, 32
Fertilizer C: 20, 21, 19, 22

# Create the dataset
A <- c(22, 25, 24, 27)
B <- c(31, 30, 29, 32)
C <- c(20, 21, 19, 22)

group <- factor(rep(c("A", "B", "C"), each=4))
height <- c(A, B, C)
data <- data.frame(group, height)

# Perform One-Way ANOVA
result <- aov(height ~ group, data=data)
summary(result)

# Visualize with boxplot
boxplot(height ~ group, data=data,
        col=c("#E8EAF6", "#C5CAE9", "#9FA8DA"),
        main="Plant Height by Fertilizer Type",
        ylab="Height (cm)", xlab="Fertilizer Type")
stripchart(height ~ group, data=data,
           vertical=TRUE, method="jitter",
           pch=19, col="#1A237E", add=TRUE)

# Post-hoc analysis
TukeyHSD(result)
    

Solution Analysis:

H₀: All fertilizer types have the same mean effect on plant height
H₁: At least one fertilizer type has a different mean effect
α = 0.05
Check assumptions:
- Independence: Satisfied by experimental design
- Normality: Can be checked with qq-plot
- Equal variance: Can be checked with Levene's test

Problem 2: Effect of Study Methods on Exam Scores

Students were randomly assigned to three study methods. Their exam scores are:

Flashcards: 88, 85, 91, 87
Reading: 82, 79, 77, 81
Videos: 90, 92, 89, 91

# Create the dataset
flashcards <- c(88, 85, 91, 87)
reading <- c(82, 79, 77, 81)
videos <- c(90, 92, 89, 91)

method <- factor(rep(c("Flashcards", "Reading", "Videos"), each=4))
score <- c(flashcards, reading, videos)
data <- data.frame(method, score)

# Check assumptions
# 1. Normality
par(mfrow=c(1,2))
qqnorm(residuals(aov(score ~ method, data=data)))
qqline(residuals(aov(score ~ method, data=data)))

# 2. Equal variance
library(car)
leveneTest(score ~ method, data=data)

# Perform ANOVA
result <- aov(score ~ method, data=data)
summary(result)

# Post-hoc analysis if significant
TukeyHSD(result)

# Effect size
summary_stats <- summary(result)
eta_squared <- summary_stats[[1]]["method", "Sum Sq"] / 
               sum(summary_stats[[1]][, "Sum Sq"])
print(paste("Eta-squared =", round(eta_squared, 3)))
    

Problem 3: Caffeine's Effect on Reaction Time

Study of caffeine dose effect on reaction time (in ms):

None: 320, 310, 305, 315
Moderate: 280, 275, 285, 278
High: 300, 295, 298, 302

# Create the dataset
none <- c(320, 310, 305, 315)
moderate <- c(280, 275, 285, 278)
high <- c(300, 295, 298, 302)

caffeine <- factor(rep(c("None", "Moderate", "High"), each=4))
reaction <- c(none, moderate, high)
data <- data.frame(caffeine, reaction)

# Descriptive statistics
tapply(reaction, caffeine, summary)
tapply(reaction, caffeine, sd)

# ANOVA
result <- aov(reaction ~ caffeine, data=data)
summary(result)

# Visualization
library(ggplot2)
ggplot(data, aes(x=caffeine, y=reaction, fill=caffeine)) +
    geom_boxplot(alpha=0.7) +
    geom_jitter(width=0.2, color="#1A237E") +
    theme_minimal() +
    labs(title="Reaction Time by Caffeine Dose",
         y="Reaction Time (ms)",
         x="Caffeine Level") +
    scale_fill_brewer(palette="Blues")
    

Conceptual Understanding Check

Q1: Understanding the F-distribution

Question: How many degrees of freedom does an F-distribution have? What are those degrees of freedom called?

Options:
A. An F-distribution has three degrees of freedom: numerator, denominator, and F-statistic
B. An F-distribution has two degrees of freedom: numerator and denominator
C. An F-distribution has two degrees of freedom: denominator and numerator
D. An F-distribution has one degree of freedom: F-statistic

Correct Answer: B
      

Explanation: The F-distribution has exactly two degrees of freedom:

Numerator df (df₁): Comes from the between-groups variation
Denominator df (df₂): Comes from the within-groups variation

In ANOVA, df₁ = k-1 (where k is number of groups) and df₂ = N-k (where N is total sample size).

Q2: Identifying df from Notation

Question: An F-curve has df = (1,14)

a) Numerator df = 1
b) Denominator df = 14

Correct Answer: Both statements are true
      

Explanation: In F-distribution notation:

The first number always represents the numerator df (df₁)
The second number always represents the denominator df (df₂)
This notation is consistent across statistical software and tables

Q3: Using F-distribution Tables

Question: An F-curve has degrees of freedom (10,15). Use an F-distribution table to find the F-value that has an area of 0.05 to its right.

# Using R to find the critical value
qf(0.95, df1=10, df2=15)  # α = 0.05 for right-tailed test
[1] 2.54

# Verify using cumulative probability
1 - pf(2.54, df1=10, df2=15)
[1] 0.05
      

Explanation:

For α = 0.05 (right-tailed), we need the 95th percentile (1 - α)
The critical value F₀.₀₅,₁₀,₁₅ ≈ 2.54
Any F-statistic larger than 2.54 would fall in the rejection region

Q4: ANOVA Relationship

Question: One-way ANOVA is a procedure for comparing the means of several populations. It generalizes which of the following procedures?

Options:
A. Mann-Whitney test
B. Pooled t-procedure
C. Paired t-test
D. Two-means z-test

Correct Answer: B
      

Explanation:

ANOVA is an extension of the pooled t-procedure to more than two groups
When ANOVA is used with exactly two groups, the F-statistic equals the square of the t-statistic
The relationship: F = t² when comparing exactly two groups
Both procedures assume independent samples and equal variances

Q5: Degrees of Freedom in ANOVA

Question: Suppose a one-way ANOVA is performed to compare the means of 5 populations, with sample sizes: 15, 17, 14, 18, and 12.

# Calculate degrees of freedom
k <- 5  # number of groups
N <- 15 + 17 + 14 + 18 + 12  # total sample size
df1 <- k - 1  # between-groups df
df2 <- N - k  # within-groups df
c(df1, df2)
[1] 4 71
      

Explanation:

df₁ = k - 1 = 5 - 1 = 4 (numerator)
N = 15 + 17 + 14 + 18 + 12 = 76 (total sample size)
df₂ = N - k = 76 - 5 = 71 (denominator)
Answer format: (4, 71)

For more resources from this instructor: Visit the Learning Resources Portal