Learning Objectives

After completing this chapter, you should be able to:

Lecture Notes: Chapter 16 – Analysis of Variance (ANOVA)

Introduction to ANOVA

Analysis of Variance (ANOVA) is a statistical method used to compare the means of three or more groups simultaneously. While a t-test can compare two means, performing multiple t-tests to compare many groups increases the risk of making a Type I error (incorrectly rejecting the null hypothesis). ANOVA uses an F-test to determine if there is a significant difference between the means of the groups being compared by examining the ratio of variability between groups to variability within groups.

Section 1: The F-Distribution

The F-distribution is a probability distribution used in ANOVA and other statistical tests. Use the sliders below to explore how different degrees of freedom affect the shape of the distribution:

Quick R Commands for F-distribution Analysis

# Critical F-value (right-tailed) qf(p = 0.95, df1 = 2, df2 = 6) # α = 0.05 # P-value from F-statistic 1 - pf(1.69, df1 = 2, df2 = 6) # Right-tailed test # F-distribution density at specific point df(x = 1.69, df1 = 2, df2 = 6) # Plotting F-distribution curve(df(x, df1 = 2, df2 = 6), from = 0, to = 6, main = "F-distribution (df1=2, df2=6)") abline(v = qf(0.95, 2, 6), col = "red", lty = 2) # Add critical value

Key Functions:

Section 2: One-Way ANOVA Framework

One-Way ANOVA is used when you have one categorical independent variable (factor) with three or more levels (groups) and one quantitative dependent variable. The test determines if there is a statistically significant difference between the means of these groups.

Assumptions for One-Way ANOVA:

For the results of a one-way ANOVA to be valid, the following assumptions should ideally be met:

ANOVA Identity:

The total variation in the data (SST) can be partitioned into the variation between groups (SSTR) and the variation within groups (SSE). The fundamental equation of ANOVA is:

\[SST = SSTR + SSE\]

Section 3: ANOVA Procedure Step-by-Step

The procedure involves calculating the sum of squares, degrees of freedom, mean squares, and finally the F-statistic, which are typically summarized in an ANOVA table.

ANOVA Table Format:

Source of Variation df SS MS = SS/df F
Between Groups (Treatment) k - 1 SSTR MSTR = SSTR/(k-1) MSTR/MSE
Within Groups (Error) N - k SSE MSE = SSE/(N-k)
Total N - 1 SST

Where:

Section 4: Multiple Comparisons

If the One-Way ANOVA test results in rejecting the null hypothesis \(H_0\) (i.e., you conclude that at least one group mean is different), you typically want to know *which* specific pairs of group means are different. Performing simple t-tests between all pairs of groups increases the family-wise error rate (the probability of making at least one Type I error among all comparisons). Multiple comparison procedures are designed to control this error rate.

Tukey's Honestly Significant Difference (HSD) test is a common method for pairwise comparisons when you have equal (or nearly equal) sample sizes per group and the equal variance assumption holds. It is based on the studentized range distribution.

Tukey’s Procedure Steps:

  1. Choose a family confidence level \(1 - \alpha\). This is the probability that all confidence intervals constructed for all pairwise comparisons contain the true difference. The typical \(\alpha\) from the ANOVA is often used here.
  2. Find the studentized range critical value \(q_{\alpha,\,k,\,n-k}\) from the studentized range table (or using software) with \(\alpha\), number of groups \(k\), and error degrees of freedom \(n-k\).
  3. For each pair of groups \(i\) and \(j\) (where \(i \ne j\)), compute the confidence interval for the difference between their population means (\(\mu_i - \mu_j\)). The confidence interval is given by:
    \[ (\bar{x}_i - \bar{x}_j) \pm \dfrac{q_{\alpha,\,k,\,n-k}}{\sqrt{2}} \sqrt{MSE \left(\dfrac{1}{n_i} + \dfrac{1}{n_j}\right)} \] (Note: An equivalent and common form when sample sizes are equal, \(n_i = n_j = n'\), is \((\bar{x}_i - \bar{x}_j) \pm q_{\alpha,\,k,\,n-k} \sqrt{\dfrac{MSE}{n'}}\). The first formula is more general for unequal \(n_i\).)
  4. If the confidence interval for a pair of means does not contain 0, you declare that the population means \(\mu_i\) and \(\mu_j\) are significantly different at the chosen family confidence level. If the interval contains 0, you do not have sufficient evidence to conclude they are different.

Section 5: The Kruskal–Wallis Test

The Kruskal–Wallis H test is a nonparametric alternative to the One-Way ANOVA. It is used when the assumptions for ANOVA, particularly normality or homogeneity of variances, are not met. It tests whether samples originate from the same distribution, which is often interpreted as testing for differences in medians or locations when the shape of the distributions is similar across groups.

Assumptions for the Kruskal–Wallis Test:

Test statistic:

The test involves ranking all observations from all groups together. The Kruskal-Wallis test statistic, \(H\) (often denoted \(K\)), is calculated based on the sum of the ranks for each group:

\[ H = \frac{12}{n(n+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(n+1) \]

Where \(k\) is the number of groups, \(n_i\) is the sample size of the \(i\)-th group, \(R_i\) is the sum of the ranks for the \(i\)-th group, and \(n = \sum n_i\) is the total sample size.

For reasonably large sample sizes (\(n_i \ge 5\)), the distribution of the \(H\) statistic can be approximated by a chi-squared distribution (\(\chi^2\)) with \(k-1\) degrees of freedom.

Procedure using the Kruskal–Wallis Test:

  1. State hypotheses:
    \(H_0:\) The distributions of the variable are the same for all groups.
    \(H_a:\) At least one group's distribution is different from the others (often interpreted as different medians/locations).
  2. Select significance level \(\alpha\).
  3. Combine all data and rank them from smallest (1) to largest (n). If there are ties, assign the average rank to the tied observations.
  4. Calculate the sum of ranks (\(R_i\)) for each group.
  5. Compute the \(H\) test statistic using the formula above.
  6. Find the critical value \(\chi^2_{\alpha,\,k-1}\) from the chi-squared distribution table (or using software) with \(k-1\) degrees of freedom. Or, compute the p-value associated with the calculated \(H\) statistic.
  7. Compare \(H\) to the critical value (or compare p-value to \(\alpha\)) and conclude. If \(H > \chi^2_{\alpha,\,k-1}\) (or p-value \(< \alpha\)), reject \(H_0\).

Example: Kruskal-Wallis Test

Question: Analyze whether there are differences in patient recovery times (days) across three treatment methods when the data appears non-normal:

# Basic Kruskal-Wallis test in R treat_A <- c(5, 7, 6, 8, 4) treat_B <- c(12, 9, 11, 8, 10) treat_C <- c(6, 8, 7, 9, 5) recovery <- c(treat_A, treat_B, treat_C) groups <- factor(rep(c("A", "B", "C"), each=5)) # Perform test kruskal.test(recovery ~ groups) # Basic boxplot boxplot(recovery ~ groups)

Section 6: Real-World Examples with R

Example 1: Teaching Methods Comparison

Comparing test scores across three teaching methods: Traditional Lecture, Online Modules, and Peer-to-Peer Learning.

# Create the dataset x1 <- c(75, 82, 68) # Traditional Lecture x2 <- c(90, 78, 85) # Online Modules x3 <- c(88, 92, 80) # Peer-to-Peer X <- c(x1, x2, x3) Method <- rep(c("Traditional", "Online", "PeertoPeer"), each=3) scores_data <- data.frame(Score=X, Method=Method) # Perform One-Way ANOVA model <- aov(Score ~ Method, data=scores_data) summary(model) # Visualize the data using base R boxplot(Score ~ Method, data=scores_data, col=c("#E8EAF6", "#C5CAE9", "#9FA8DA"), main="Test Scores by Teaching Method", ylab="Test Score", xlab="Teaching Method") # Add individual points stripchart(Score ~ Method, data=scores_data, vertical=TRUE, method="jitter", pch=19, col="#1A237E", add=TRUE)

Example 2: Effect Size Visualization

# Calculate effect size (η²) model_stats <- summary(model) eta_squared <- model_stats[[1]]["Method", "Sum Sq"] / sum(model_stats[[1]][, "Sum Sq"]) # Create bar plot of means with error bars library(ggplot2) ggplot(scores_data, aes(x=Method, y=Score, fill=Method)) + stat_summary(fun=mean, geom="bar") + stat_summary(fun.data=mean_se, geom="errorbar", width=0.2) + scale_fill_brewer(palette="Blues") + labs(title=paste("Mean Scores by Method\nη² =", round(eta_squared, 3)), y="Test Score") + theme_minimal()

Effect Size Calculations

Eta-squared (η²):

\[\eta^2 = \frac{SS_{\text{between}}}{SS_{\text{total}}} = \frac{SSTR}{SST}\]

Cohen's f:

\[f = \sqrt{\frac{\eta^2}{1-\eta^2}}\]

Choosing the Right Analysis Method

Start → Are there outliers? → Yes → Consider robust methods or nonparametric tests
                           → No  → Check normality
                                     → Normal → Check variance equality
                                                 → Equal → Use One-Way ANOVA
                                                 → Unequal → Use Welch's ANOVA
                                     → Not Normal → Sample size > 30?
                                                    → Yes → Use One-Way ANOVA
                                                    → No → Use Kruskal-Wallis
    

Worked Example: One-Way ANOVA

Let's analyze test scores from three different teaching methods (Group A, B, C) using One-Way ANOVA. We have \(k=3\) groups.

Data (Test Scores):

Total sample size \(n = n_A + n_B + n_C = 3 + 3 + 3 = 9\.

  1. Hypotheses: \[H_0: \mu_A=\mu_B=\mu_C,\quad H_a:\text{at least one mean differs}\]
  2. Calculate Sample Means and Grand Mean: \(\bar{x}_A = (85+90+80)/3 = 85\) \(\bar{x}_B = (88+92+84)/3 = 88\) \(\bar{x}_C = (78+85+82)/3 = 81.67\) Grand Mean \(\bar{x} = (85+88+81.67) \times 3 / 9 = (255 + 264 + 245.01)/9 = 764.01/9 \approx 84.89\)
  3. Calculate Sum of Squares:
    Between-Groups SS (SSTR): \[ \text{SSTR} = \sum n_i(\bar{x}_i-\bar{x})^2 \] \[ \text{SSTR} = 3(85 - 84.89)^2 + 3(88 - 84.89)^2 + 3(81.67 - 84.89)^2 \] \[ \text{SSTR} = 3(0.11)^2 + 3(3.11)^2 + 3(-3.22)^2 \] \[ \text{SSTR} = 3(0.0121) + 3(9.6721) + 3(10.3684) = 0.0363 + 29.0163 + 31.1052 \approx 60.16 \]
    Within-Groups SS (SSE): \[ \text{SSE} = \sum_{i=1}^k \sum_{j=1}^{n_i} (x_{ij}-\bar{x}_i)^2 \] \[ \text{SSE}_{\text{Group A}} = (85-85)^2 + (90-85)^2 + (80-85)^2 = 0^2 + 5^2 + (-5)^2 = 0 + 25 + 25 = 50 \] \[ \text{SSE}_{\text{Group B}} = (88-88)^2 + (92-88)^2 + (84-88)^2 = 0^2 + 4^2 + (-4)^2 = 0 + 16 + 16 = 32 \] \[ \text{SSE}_{\text{Group C}} = (78-81.67)^2 + (85-81.67)^2 + (82-81.67)^2 \approx (-3.67)^2 + (3.33)^2 + (0.33)^2 \approx 13.47 + 11.09 + 0.11 = 24.67 \] \[ \text{SSE} = 50 + 32 + 24.67 \approx 106.67 \]
    Total SS (SST): \[ \text{SST} = \text{SSTR} + \text{SSE} = 60.16 + 106.67 = 166.83 \] (Or calculate directly: $\sum (x_{ij}-\bar{x})^2$)
  4. Degrees of Freedom: dfTreatment = \(k − 1 = 3 − 1 = 2\)
    dfError = \(n − k = 9 − 3 = 6\)
    dfTotal = \(n − 1 = 9 − 1 = 8\)
    (Check: dfTreatment + dfError = 2 + 6 = 8 = dfTotal)
  5. Calculate Mean Squares: \(MSTR = \dfrac{SSTR}{\text{df}_{\text{Treatment}}} = \dfrac{60.16}{2} \approx 30.08\)
    \(MSE = \dfrac{SSE}{\text{df}_{\text{Error}}} = \dfrac{106.67}{6} \approx 17.78\)
  6. Calculate F-Statistic: \[F = \frac{MSTR}{MSE} = \frac{30.08}{17.78} \approx 1.69\]
  7. ANOVA Table Summary:
    SourcedfSSMSF
    Treatment260.1630.081.69
    Error6106.6717.78
    Total8166.83
  8. Determine Critical Value or p-value: Using \(\alpha = 0.05\), the critical value \(F_{0.05,\,2,\,6}\) is looked up from an F-table or found using software.
    Critical Value: \(F_{0.05,\,2,\,6} \approx 5.14\).
    (Using software, the p-value for F=1.69 with df1=2 and df2=6 is approximately 0.26.)
  9. Compare and Conclude: Comparing the calculated F-statistic (1.69) to the critical value (5.14):
    Since \(1.69 < 5.14\), we fail to reject \(H_0\).
    (Or, comparing the p-value (0.26) to \(\alpha=0.05\): Since \(0.26 > 0.05\), we fail to reject \(H_0\).)
  10. Interpretation: At the \(\alpha=0.05\) significance level, there is not enough statistically significant evidence to conclude that the means of the test scores for the three teaching methods are different.

Practice Problems with R

Problem 1: Plant Growth by Fertilizer Type

Researchers tested the effectiveness of three fertilizers (A, B, and C) on plant growth. The plant heights (in cm) after 30 days are:

# Create the dataset A <- c(22, 25, 24, 27) B <- c(31, 30, 29, 32) C <- c(20, 21, 19, 22) group <- factor(rep(c("A", "B", "C"), each=4)) height <- c(A, B, C) data <- data.frame(group, height) # Perform One-Way ANOVA result <- aov(height ~ group, data=data) summary(result) # Visualize with boxplot boxplot(height ~ group, data=data, col=c("#E8EAF6", "#C5CAE9", "#9FA8DA"), main="Plant Height by Fertilizer Type", ylab="Height (cm)", xlab="Fertilizer Type") stripchart(height ~ group, data=data, vertical=TRUE, method="jitter", pch=19, col="#1A237E", add=TRUE) # Post-hoc analysis TukeyHSD(result)

Solution Analysis:

Problem 2: Effect of Study Methods on Exam Scores

Students were randomly assigned to three study methods. Their exam scores are:

# Create the dataset flashcards <- c(88, 85, 91, 87) reading <- c(82, 79, 77, 81) videos <- c(90, 92, 89, 91) method <- factor(rep(c("Flashcards", "Reading", "Videos"), each=4)) score <- c(flashcards, reading, videos) data <- data.frame(method, score) # Check assumptions # 1. Normality par(mfrow=c(1,2)) qqnorm(residuals(aov(score ~ method, data=data))) qqline(residuals(aov(score ~ method, data=data))) # 2. Equal variance library(car) leveneTest(score ~ method, data=data) # Perform ANOVA result <- aov(score ~ method, data=data) summary(result) # Post-hoc analysis if significant TukeyHSD(result) # Effect size summary_stats <- summary(result) eta_squared <- summary_stats[[1]]["method", "Sum Sq"] / sum(summary_stats[[1]][, "Sum Sq"]) print(paste("Eta-squared =", round(eta_squared, 3)))

Problem 3: Caffeine's Effect on Reaction Time

Study of caffeine dose effect on reaction time (in ms):

# Create the dataset none <- c(320, 310, 305, 315) moderate <- c(280, 275, 285, 278) high <- c(300, 295, 298, 302) caffeine <- factor(rep(c("None", "Moderate", "High"), each=4)) reaction <- c(none, moderate, high) data <- data.frame(caffeine, reaction) # Descriptive statistics tapply(reaction, caffeine, summary) tapply(reaction, caffeine, sd) # ANOVA result <- aov(reaction ~ caffeine, data=data) summary(result) # Visualization library(ggplot2) ggplot(data, aes(x=caffeine, y=reaction, fill=caffeine)) + geom_boxplot(alpha=0.7) + geom_jitter(width=0.2, color="#1A237E") + theme_minimal() + labs(title="Reaction Time by Caffeine Dose", y="Reaction Time (ms)", x="Caffeine Level") + scale_fill_brewer(palette="Blues")

Conceptual Understanding Check

Q1: Understanding the F-distribution

Question: How many degrees of freedom does an F-distribution have? What are those degrees of freedom called?

Options: A. An F-distribution has three degrees of freedom: numerator, denominator, and F-statistic B. An F-distribution has two degrees of freedom: numerator and denominator C. An F-distribution has two degrees of freedom: denominator and numerator D. An F-distribution has one degree of freedom: F-statistic Correct Answer: B

Explanation: The F-distribution has exactly two degrees of freedom:

In ANOVA, df₁ = k-1 (where k is number of groups) and df₂ = N-k (where N is total sample size).

Q2: Identifying df from Notation

Question: An F-curve has df = (1,14)

a) Numerator df = 1 b) Denominator df = 14 Correct Answer: Both statements are true

Explanation: In F-distribution notation:

Q3: Using F-distribution Tables

Question: An F-curve has degrees of freedom (10,15). Use an F-distribution table to find the F-value that has an area of 0.05 to its right.

# Using R to find the critical value qf(0.95, df1=10, df2=15) # α = 0.05 for right-tailed test [1] 2.54 # Verify using cumulative probability 1 - pf(2.54, df1=10, df2=15) [1] 0.05

Explanation:

Q4: ANOVA Relationship

Question: One-way ANOVA is a procedure for comparing the means of several populations. It generalizes which of the following procedures?

Options: A. Mann-Whitney test B. Pooled t-procedure C. Paired t-test D. Two-means z-test Correct Answer: B

Explanation:

Q5: Degrees of Freedom in ANOVA

Question: Suppose a one-way ANOVA is performed to compare the means of 5 populations, with sample sizes: 15, 17, 14, 18, and 12.

# Calculate degrees of freedom k <- 5 # number of groups N <- 15 + 17 + 14 + 18 + 12 # total sample size df1 <- k - 1 # between-groups df df2 <- N - k # within-groups df c(df1, df2) [1] 4 71

Explanation:

For more resources from this instructor: Visit the Learning Resources Portal