© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 1 of 16
Repeated Measures ANOVA Tutorial:
RSCH-8250 Advanced Quantitative Reasoning
Charles T. Diebold, Ph.D.
September 15, 2013 How to cite this document: Diebold, C. T. (2013, September 15). Repeated measures ANOVA tutorial: RSCH-8250 advanced quantitative
reasoning. Available from tom.diebold@waldenu.edu.
Table of Contents Assignment and Tutorial Introduction ................................................................................................................ 2 Section 1: SPSS Specification of the Assignment ............................................................................................... 2
Descriptive Statistics ...................................................................................................................................... 3 Repeated Measures ANOVA ......................................................................................................................... 3
Section 2: Annotated Example SPSS Output, Write Up Guide, and Sample APA Table ..................................... 6 Descriptive Statistics ...................................................................................................................................... 6 Sphericity Assumption ................................................................................................................................... 7 Tests of Within-Subjects Effects .................................................................................................................... 7 Post Hoc: Profile Plot and Statistical Pairwise Comparison via Estimated Marginal Means ............................ 8 Tests of Within-Subjects Contrasts .............................................................................................................. 10 Test of Between-Subjects Effects ................................................................................................................. 10 Multivariate Tests for Repeated Measures .................................................................................................... 11 Results Write Up Guide ............................................................................................................................... 11 Sample APA Table ...................................................................................................................................... 12 For the Inquisitive: Sphericity and Within-Subjects Effects Redux ............................................................... 13
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 2 of 16
Repeated Measures ANOVA Tutorial: RSCH-8250 Advanced Quantitative Reasoning
Assignment and Tutorial Introduction
This tutorial is intended to assist RSCH-8250 students in completing the Week 4 application assignment. I recommend that you use this tutorial as your first line of instruction; then, if you have time, study the textbook chapter, Dr. Morrow’s video’s, or other resources noted in the classroom. 3rd edition of Field textbook: Review Chapter 13 in the Field textbook and complete repeated measures analysis of variance in Smart Alex's Task #2 on p. 504, using the Tutormarks.sav data set from the Field text. 4th edition of Field textbook: Review Chapter 14 in the Field textbook and complete repeated measures analysis of variance in Smart Alex's Task #2 on p. 589, using the Tutormarks.sav data set from the Field text. The objective of the exercise is to conduct and interpret a repeated measures ANOVA using the following four variables: tutor1, tutor2, tutor3, tutor4. These represent scores on the same assignment by eight students from four different lecturers. The tutorial contains two sections. Section 1 provides step-by-step graphic user interface (GUI) screenshot for specifying the assignment in SPSS. If you follow the steps you will produce correct SPSS output. Section 2 presents and interprets output for a different set of variables, includes a results write up guide, and sample APA style table (the variables and data in Section 2 are “made up” and do not reflect real research).
Section 1: SPSS Specification of the Assignment Open the dataset, the Variable View screenshot is shown below. There are four variables in the dataset as described above.
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 3 of 16
Descriptive Statistics Go to AnalyzeDescriptive StatisticsDescriptives (below left). A Descriptives dialogue appears; select and move all four variables into the Variable(s) box (below right). Then click OK, which will produce descriptive statistic output for each lecturers score.
Repeated Measures ANOVA Go to AnalyzeGeneral Linear ModelRepeated Measures (below left). A Repeated Measures Define Factor(s) dialogue appears (below right).
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 4 of 16
In the Within-Subjects Factor Name box, change “factor1” to “lecturer” and type in 4 for the number of levels (this is the number of repeated measures, in this case there were 4 lecturers (below left), then click the Add button. Lecturer(4) now appears as in below right. Click the Define button.
After clicking the Define button, the dialogue below left will appear. Highlight and move all four variables into the Within-subjects Variables box as shown below right. Click the Plots button.
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 5 of 16
After clicking the Plots button, the dialogue below left appears. Move lecturer to the Horizontal Axis box; click the Add button, which will add lecturer to the Plots box as shown below right. Click the Continue button.
After clicking the Continue button you are returned to the Repeated Measures dialogue shown bottom right on previous page. Click the Options button. In the Options dialogue, move (OVERALL) and lecturer to the Display Means for box and check the box next to Compare main effects. Under Confidence Interval adjustment, click the down arrow and select Bonferroni. Under the Display options, check Descriptive statistics and Estimates of effect size.
Clicking Continue will produce the output needed for the assignment.
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 6 of 16
Section 2: Annotated Example SPSS Output, Write Up Guide, and Sample APA Table The example output shown below uses variables different from the Week 4 assignment. The purpose is to explain key elements of the output, point out what to focus on, and demonstrate how to interpret and report the results in APA statistical style. The output presented below is not in order of appearance in SPSS output, but rearranged to address specific teaching objectives and interpretation tasks. The variables in the example output are five quiz scores. For context, you can think of these as a parallel-forms design with each quiz equivalent in terms of content and difficulty but given under varying conditions. For example, the conditions might be different types of noise, each presented at the same decibel level (e.g., 100dB, which is about the noise level alongside a lawnmower):
Quiz Noise Condition 1 Nearby crowd of conversing people 2 Orchestra music 3 Pop music 4 Radio shock news 5 Jack hammers
Descriptive Statistics The first part of the output, Within-Subjects Factors, confirms that we set up the repeated measures successfully. Here we expect and have the five quizzes listed as dependent variables. In your assignment you should see each of the four tutors listed (if not, do not pass go, go back and restart the SPSS specification as detailed in Section 1 of this tutorial. The Descriptive Statistics portion of the output provides the mean and standard deviation for each of the five quizzes. This information is needed for the APA table (see Sample APA Table section of this tutorial).
Within-Subjects Factors
Measure: MEASURE_1 quizzes Dependent
Variable
1 quiz1
2 quiz2
3 quiz3
4 quiz4
5 quiz5
Descriptive Statistics Mean Std. Deviation N quiz1 7.47 2.481 105
uiz2 7.98 1.623 105
quiz3 7.98 2.308 105
quiz4 7.80 2.280 105
quiz5 7.87 1.765 105
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 7 of 16
Sphericity Assumption Mauchly’s test of sphericity needs to be reported and decision needs to be made whether to use the sphericity assumed, Greenhouse-Geisser, or Huynh-Feldt results in the Tests of Within-Subjects Effects. Sphericity for repeated measures is similar to homogeneity of variance for between-groups ANOVA. If the variances of each repeated measure is equal, and if the covariances (and, thus, correlations) of each pair of repeated measures is equal, then there is compound symmetry and, as a result, sphericity is satisfied. When violated (p < .05), the F test is too liberal (increased Type I error), incorrectly concluding statistical significance. Here, the sphericity assumption was violated, Mauchly’s W(9, N = 105) = 93.85, p < .001. So, in the Tests of Within-Subjects Effects, we cannot use the sphericity assumed results. Instead, for purposes of this assignment, we need to choose between the Greenhouse-Geisser adjusted results or the Huynh-Feldt adjusted results. In this case, we would not draw different conclusions using either, but as a general rule, use Greenhouse-Geisser if its epsilon value is less than .75, otherwise use Huynh-Feldt1
Mauchly's Test of Sphericitya
.
Measure: MEASURE_1 Within Subjects
Effect
Mauchly's W Approx. Chi-
Square
df Sig. Epsilonb
Greenhouse-
Geisser
Huynh-Feldt Lower-bound
quizzes .400 93.851 9 .000 .640 .657 .250 Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a. Design: Intercept Within Subjects Design: quizzes b. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table. Tests of Within-Subjects Effects The Greenhouse-Geisser adjusted test of mean differences across the five repeated quiz measures was statistically significant, F(2.56, 266.10) = 3.049. p = .037, ηp2 = η2 = .0282, ω2 = .0153
(a small effect).
Tests of Within-Subjects Effects
Measure: MEASURE_1 Source Type III Sum of
Squares
df Mean Square F Sig. Partial Eta
Squared
quizzes
Sphericity Assumed 18.819 4 4.705 3.049 .017 .028
Greenhouse-Geisser 18.819 2.559 7.355 3.049 .037 .028
Huynh-Feldt 18.819 2.629 7.159 3.049 .035 .028
Lower-bound 18.819 1.000 18.819 3.049 .084 .028
Error(quizzes)
Sphericity Assumed 641.981 416 1.543 Greenhouse-Geisser 641.981 266.100 2.413 Huynh-Feldt 641.981 273.385 2.348 Lower-bound 641.981 104.000 6.173
1 There are other approaches to dealing with violation of sphericity, but they are beyond the scope of this course. 2 In a oneway ANOVA, whether between-subjects or within-subjects, partial eta squared and eta squared are the same. 3 ω2 calculated using formula 14.1 in Field’s 4th edition (2013).
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 8 of 16
Post Hoc: Profile Plot and Statistical Pairwise Comparison via Estimated Marginal Means The test of within-subjects effect was statistically significant, indicating there was a difference in means “somewhere” among the five quizzes. To figure out which ones were different we have to dig deeper with post hoc analysis. Though statistically significant differences cannot be inferred from a plot of the quizzes, it is a good place to start to get a sense of what’s going on. In the plot below we see that the crowd of conversing people (quiz1) was
associated with the lowest of the five quiz scores. Orchestra music (quiz2) and pop music (quiz3) were associated with the highest scores. Radio shock news (quiz4) and jackhammers (quiz5) were associated with scores higher than those for crowd noise and somewhat lower than the two music conditions. Had this been real research, one would have selected the noise conditions with some a priori theoretical explanation for expected differences and interpret the actual result in light of the theoretical expectations (I leave such to your scientific imagination).
Visual depictions can be misleading, which is why we rely on statistical tests to determine which quiz means were different from the others. Tests of pairwise comparisons are part of the Estimated Marginal Means (EMM) portion of the output. The first two parts of the EMM output provide mean, standard error, and 95% confidence intervals for the mean. These are useful for reference, but the meat (or tofu) is in the output labeled Pairwise Comparisons (see next page).
1. Grand Mean
Measure: MEASURE_1 Mean Std. Error 95% Confidence Interval
Lower Bound Upper Bound
7.819 .176 7.470 8.168
Estimates
Measure: MEASURE_1 quizzes Mean Std. Error 95% Confidence Interval
Lower Bound Upper Bound
1 7.467 .242 6.987 7.947
2 7.981 .158 7.667 8.295
3 7.981 .225 7.534 8.428
4 7.800 .223 7.359 8.241
5 7.867 .172 7.525 8.208
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 9 of 16
There is redundancy in the Pairwise Comparisons output, so be careful not to repeat yourself in the results write up. For example, the pairwise comparison of quiz 1 with quiz 2 is the same as the pairwise comparison of quiz 2 with quiz 1. You will avoid redundant results if you consider only the quiz numbers in the 2nd column that are numbered higher than the quiz number in the 1st column. For example, for the five rows of information associated with 1st column quiz 3, only consider rows 4 and 5. In this example, only two pairs of quizzes statistically significantly differed using Bonferroni adjusted p values. The crowd conversing condition (quiz1) had a lower mean (MD = -0.514, p = .049) than the orchestra music condition (quiz2), and a lower mean (MD = -0.514, p = .001) than the pop music condition (quiz3). Notice in the table that there are significance values of 1.000. Just like p cannot equal .000, it cannot equal 1.000. In such cases, report as p > .999). Also notice that quiz2 and quiz3 had the same mean and the same mean difference from quiz1, but one had a p value of .049 and the other .001. If curious, see the “For the Inquisitive…” section of this tutorial.
Pairwise Comparisons
Measure: MEASURE_1 (I) quizzes (J) quizzes Mean Difference
(I-J)
Std. Error Sig.b 95% Confidence Interval for
Differenceb
Lower Bound Upper Bound
1
2 -.514* .179 .049 -1.028 -.001
3 -.514* .126 .001 -.874 -.154
4 -.333 .137 .168 -.727 .060
5 -.400 .215 .658 -1.017 .217
2
1 .514* .179 .049 .001 1.028
3 .000 .164 1.000 -.469 .469
4 .181 .173 1.000 -.316 .678
5 .114 .129 1.000 -.255 .483
3
1 .514* .126 .001 .154 .874
2 .000 .164 1.000 -.469 .469
4 .181 .143 1.000 -.229 .591
5 .114 .205 1.000 -.475 .703
4
1 .333 .137 .168 -.060 .727
2 -.181 .173 1.000 -.678 .316
3 -.181 .143 1.000 -.591 .229
5 -.067 .212 1.000 -.676 .542
5
1 .400 .215 .658 -.217 1.017
2 -.114 .129 1.000 -.483 .255
3 -.114 .205 1.000 -.703 .475
4 .067 .212 1.000 -.542 .676 Based on estimated marginal means *. The mean difference is significant at the .05 level. b. Adjustment for multiple comparisons: Bonferroni.
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 10 of 16
Tests of Within-Subjects Contrasts The linear, quadratic, cubic, and fourth-order contrasts are appropriate when the independent variable is interval, such as repeated measures one month apart in time or equal interval dosage increases. In this example (and for the assignment) the independent variable is not interval level, but nominal, reflecting different environmental conditions under which the quiz was taken, so these contrasts do not apply. However, suppose quiz1 was taken after drinking a glass of water (the control condition), quiz2 after 1 cup of coffee, quiz3 after 2 cups of coffee, quiz4 after 3 cups of coffee, and quiz5 after 4 cups of coffee. The nonsignificant result for the linear contrast, p = .091, would indicate that as coffee increases linearly there is not a corresponding linear improvement in quiz score. The significant quadratic result, p = .006, would indicate that as coffee increases linearly, quiz results increase to a plateau then decrease—a curvilinear effect that can be visually seen in the profile plot. So, in this scenario of the quiz conditions, coffee helps up to a point, then hurts quiz performance.
Tests of Within-Subjects Contrasts
Measure: MEASURE_1 Source quizzes Type III Sum of
Squares
df Mean Square F Sig. Partial Eta
Squared
quizzes
Linear 4.024 1 4.024 2.917 .091 .027
Quadratic 8.686 1 8.686 7.858 .006 .070
Cubic 6.095 1 6.095 2.323 .131 .022
Order 4 .014 1 .014 .013 .910 .000
Error(quizzes)
Linear 143.476 104 1.380 Quadratic 114.956 104 1.105 Cubic 272.905 104 2.624 Order 4 110.644 104 1.064
Test of Between-Subjects Effects For a oneway within subjects repeated measures ANOVA, there is no between-subjects effect because there is no grouping factor. Nonetheless, a between-subjects test output is produced, but is simply a test of the intercept and of no importance or value. Ignore it.
Tests of Between-Subjects Effects
Measure: MEASURE_1 Transformed Variable: Average Source Type III Sum of
Squares
df Mean Square F Sig. Partial Eta
Squared
Intercept 32097.190 1 32097.190 1974.033 .000 .950
Error 1691.010 104 16.260
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 11 of 16
Multivariate Tests for Repeated Measures Explication of the multivariate test of repeated measures is beyond the scope of this course and is not as simple a matter, as some suggest, to use when sphericity is violated. Ignore it.
Multivariate Tests Value F Hypothesis df Error df Sig. Partial Eta
Squared
Pillai's trace .152 4.539a 4.000 101.000 .002 .152
Wilks' lambda .848 4.539a 4.000 101.000 .002 .152
Hotelling's trace .180 4.539a 4.000 101.000 .002 .152
Roy's largest root .180 4.539a 4.000 101.000 .002 .152
Each F tests the multivariate effect of quizzes. These tests are based on the linearly independent pairwise
comparisons among the estimated marginal means.
a. Exact statistic Results Write Up Guide Begin the write up by describing the context of the research and the variables. If known, state how each variable was operationalized, for example: “Overall GPA was measured on the traditional 4-point scale from 0 (F) to 4 (A)”, or “Satisfaction was measured on a 5-point likert-type scale from 1 (not at all satisfied) to 5 (extremely satisfied).” Please pay attention to APA style for reporting scale anchors (see p. 91 and p. 105 in the 6th edition of the APA Manual). Report descriptive statistics such as minimum, maximum, mean, and standard deviation for each metric variable. For nominal variables, report percentage for each level of the variable, for example: “Of the total sample (N = 150) there were 40 (26.7%) males and 110 (73.3%) females.” Keep in mind that a sentence that includes information in parentheticals must still be a sentence (and make sense) if the parentheticals are removed. For example: “Of the total sample there were 40 males and 110 females.” State the purpose of the analysis or provide the guiding research question(s). If you use research questions, do not craft them such that they can be answered with a yes or no. Instead, craft them so that they will have a quantitative answer. For example: “What is the strength and direction of relationship between X and Y?” or “What is the difference in group means on X between males and females?” Present null and alternative hypothesis sets applicable to the analysis. For repeated measures ANOVA there would be a hypothesis set for the main effect of the within-group factor (i.e., mean differences among the repeated measures). State assumptions or other considerations for the analysis, and report the actual statistical result for relevant tests. For this course, the only repeated measures ANOVA consideration that needs to be presented and discussed is for the sphericity assumption. Even if violated, you must still report and interpret the remaining results. Report and interpret the within-subjects effect, as well as any post hoc analysis as needed. Be sure to include the actual statistical results in text—examples were provided within the annotated output section of this tutorial.
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 12 of 16
Don’t forget to interpret (i.e., make sense of) the results. Draw conclusions about rejecting or failing to reject each null. If needed, summarize the results, without statistics, in a concluding sentence or paragraph. Provide APA style tables appropriate to the analysis. Do not use SPSS table output, it is not in APA style. Example APA tables for a repeated measures ANOVA are shown below using the results from the example output in this tutorial. Although one would typically not duplicate information in text and tables, it is important to demonstrate competence in both ways of reporting the results; so, you cannot just provide tables, you must also report the relevant statistical results within the textual write up. The complete SPSS output should be submitted as a separate file or pasted into the write up document (but in an appendix). Do not intermingle SPSS output and write up. The only exception to this is if you want to include an SPSS graph; they are not easily converted to APA style, so I will permit them within the body of the write up. Sample APA Table Table 1 Repeated Measures Quiz Condition, Mean, Standard Deviation, and Pairwise Comparisons (N = 105)
Pairwise Comparisons Mean difference in upper diagonal
p in lower diagonala Quiz # Condition M SD 1 2 3 4 5 1 Crowd conversing 7.47 2.48 -0.51 -0.51 -0.33 -0.40 2 Orchestra music 7.98 1.62 .049 0.00 0.18 0.11 3 Pop music 7.98 2.31 .001 > .999 0.18 0.11 4 Radio shock news 7.80 2.28 .168 > .999 > .999 -0.07 5 Jack hammers 7.87 1.77 .658 > .999 > .999 > .999 aBonferroni adjusted for multiple comparisons.
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 13 of 16
For the Inquisitive: Sphericity and Within-Subjects Effects Redux If the variances of each repeated measure is equal, and if the covariances (and, thus, correlations) of each pair of repeated measures is equal, then there is compound symmetry and, as a result, sphericity is satisfied. When violated (p < .05), the F test is too liberal (increased Type I error), incorrectly concluding statistical significance. In the example output the sphericity assumption was violated, Mauchly’s W(9, N = 105) = 93.85, p < .001.
To get a sense of why there was not sphericity, we can examine the variances (or standard deviations) and pairwise correlations of the repeated measures. From the Descriptive Statistics output we see that the standard deviations ranged from 1.623 to 2.481. Thus, the variances, being the square of the standard deviations, ranged from 2.634 to 6.155, which, on their face, seem far from being relatively equal.
From the correlation matrix we see that the pairwise correlations ranged from .445 (quiz4 with quiz5) to .858 (quiz1 with quiz3). The standard deviations of quiz1, quiz3, and quiz4 appear relatively equal, and so do their correlations: .858, .829, and .796. I would hypothesis that if just these three were analyzed in repeated measures ANOVA, sphericity would be satisfied. I tested the hypothesis and sphericity was satisfied (see output below), Mauchly’s W(2, N = 105) = 2.343, p = .310. Notice that the Greenhouse- Geisser and Huynh-Feldt epsilon values were .978 and .997, respectively. The maximum possible value is 1.0, indicating perfect symmetry.
Mauchly's Test of Sphericitya
Measure: MEASURE_1 Within Subjects Effect Mauchly's W Approx. Chi-
Square
df Sig. Epsilonb
Greenhouse-
Geisser
Huynh-Feldt Lower-bound
threequizzes .978 2.343 2 .310 .978 .997 .500
Descriptive Statistics Mean Std. Deviation N quiz1 7.47 2.481 105
quiz2 7.98 1.623 105
quiz3 7.98 2.308 105
quiz4 7.80 2.280 105
quiz5 7.87 1.765 105
Correlations quiz1 quiz2 quiz3 quiz4 quiz5
quiz1
Pearson Correlation 1 .673** .858** .829** .504**
Sig. (2-tailed) .000 .000 .000 .000 N 105 105 105 105 105
quiz2
Pearson Correlation .673** 1 .688** .633** .700**
Sig. (2-tailed) .000 .000 .000 .000 N 105 105 105 105 105
quiz3
Pearson Correlation .858** .688** 1 .796** .493**
Sig. (2-tailed) .000 .000 .000 .000 N 105 105 105 105 105
quiz4
Pearson Correlation .829** .633** .796** 1 .445**
Sig. (2-tailed) .000 .000 .000 .000 N 105 105 105 105 105
quiz5
Pearson Correlation .504** .700** .493** .445** 1
Sig. (2-tailed) .000 .000 .000 .000 N 105 105 105 105 105
**. Correlation is significant at the 0.01 level (2-tailed).
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 14 of 16
By comparison, with all five quizzes the Greenhouse-Geisser and Huynh-Feldt epsilon values were .640 and .657, respectively. While 1.0 is the maximum epsilon value, the minimum—shown as the lower-bound epsilon—depends on the number of repeated measures. Specifically, the minimum epsilon is calculated as 1 ÷ (# of repeated measures – 1). In the case of the five quizzes, this is 1 ÷ (5 – 1) = 1 ÷ 4 = .250, the lower-bound value shown in the output.
Mauchly's Test of Sphericitya
Measure: MEASURE_1 Within Subjects
Effect
Mauchly's W Approx. Chi-
Square
df Sig. Epsilonb
Greenhouse-
Geisser
Huynh-Feldt Lower-bound
quizzes .400 93.851 9 .000 .640 .657 .250 Before further examination of Greenhouse-Geisser and Huynh-Feldt, I want to return to a technical point about sphericity, itself. Previously, I stated that if there was compound symmetry (equal variance and covariance),
symmetry would be satisfied. Compound symmetry is a sufficient but not necessary condition for sphericity. Even if there is not compound symmetry, sphericity is technically tested and satisfied if the pairwise difference between the repeated measures have equal variance. With five repeated measures, there would be 10 such pairwise differences (quiz1 minus quiz2, quiz1 minus quiz3, etc.). Using syntax compute commands I actually calculated the 10 pairwise difference variables. The variances of these 10 variables, as shown in the output at left, ranged from 1.656 to 4.858, indicating unequal variance as expected. Back to epsilon and tests of within-subjects effects. Greenhouse- Geisser and Huynh-Feldt are adjustments to the df value for the repeated measures effect (here, the effect of the various quiz environmental conditions) and the df error value.
Tests of Within-Subjects Effects
Measure: MEASURE_1 Source Type III Sum
of Squares
df Mean Square F Sig. Partial Eta
Squared
Observed
Powera
quizzes
Sphericity Assumed 18.819 4 4.705 3.049 .017 .028 .805
Greenhouse-Geisser 18.819 2.559 7.355 3.049 .037 .028 .662
Huynh-Feldt 18.819 2.629 7.159 3.049 .035 .028 .670
Lower-bound 18.819 1.000 18.819 3.049 .084 .028 .409
Error(quizzes)
Sphericity Assumed 641.981 416 1.543 Greenhouse-Geisser 641.981 266.100 2.413 Huynh-Feldt 641.981 273.385 2.348 Lower-bound 641.981 104.000 6.173
a. Computed using alpha = .05
Descriptive Statistics N Variance q1minusq2 105 3.368
q1minusq3 105 1.656
q1minusq4 105 1.974
q1minusq5 105 4.858
q2minusq3 105 2.808
q2minusq4 105 3.150
q2minusq5 105 1.737
q3minusq4 105 2.150
q3minusq5 105 4.429
q4minusq5 105 4.736
Valid N (listwise) 105
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 15 of 16
With sphericity assumed df quizzes = 4 because there were 5 quizzes ( i.e., df = number of repeated measures – 1), and df error = 416 (N – 1 times number of repeated measures – 1 = 105 – 1 times 5 -1 = 104 x 4 = 416). The Greenhouse-Geisser correction value (i.e., epsilon value) was .640. The sphericity assumed effect df times .640 is the corrected effect df for Greenhouse-Geiser (4 x.640 = 2.6). Similarly, the sphericity assumed error df times .640 is the corrected error df for Greenhouse-Geiser (416 x .640 = 226.24, within rounding error of the output value). The Huynh-Feldt df effect and error adjustments are calculated the same way, but using epsilon value of .657. Because the same adjustment (multiplication by a constant value) is made to both the effect df and the error df, the F value and partial eta squared are unchanged. What differs is that significance of the F value is tested using different df values, so p will not be the same. In this example, the F value is 3.049. When evaluated at 4 and 416 df (sphericity assumed), p = .017; but when evaluated at 2.559 and 266.100 (Greenhouse-Geisser), p = .037; and for 2.629 and 273.385 (Huynh-Feldt), p = .035. Notice that the p values get larger as the epsilon adjustment value gets smaller. This helps to avoid the increased risk of Type I error when sphericity is violated. It also, however, decreases power4
. In the Tests of Within- Subjects Effects on previous page, I included a power column. It is highest with sphericity assumed (in this example .805) and decreased to .670 for Huynh-Feldt and to .662 for Greenhouse-Geisser (which had the lowest epsilon value).
It should be clear that there can be statistical significance (p < .05) with sphericity, but the effect may not be statistically significant when sphericity is violated and F test adjustments are made. Greenhouse-Geisser may underestimate epsilon resulting in too much correction and Huynh-Feldt may overestimate epsilon resulting in not enough correction. As a general rule, use Greenhouse-Geisser if its epsilon value is less than .75, otherwise use Huynh-Feldt. You can also average the two adjustments by taking the average of the two p values, even though this is not technically correct. Technically, you average the two epsilon values and compute new effect df and error df adjusted values. In this example, the average of the .640 and .657 epsilon values is .6485. Adjusted effect df would be 4 x .6485 = 2.594, and adjusted df error would be 416 x .6485 = 269.776. Unfortunately you cannot correctly compute the p value using Excel’s fdist function because it truncates the df values. Also, I am not aware of any online calculator that works with decimal df values. Finally, recall that quiz2 and quiz3 had equal mean and equal mean difference from quiz1, but the quiz1:quiz2 pairwise p = .049 and the quiz1:quiz3 pairwise p = .001. Why the difference? In a nutshell, the quiz1:quiz3 difference had smaller standard error and, all else equal, p decreases as standard error decreases. More precisely, pairwise comparisons constitute a t test, where t = mean difference ÷ standard error.
Pairwise Comparisons
Measure: MEASURE_1 (I) quizzes (J) quizzes Mean Difference
(I-J)
Std. Error Sig.b 95% Confidence Interval for
Differenceb
Lower Bound Upper Bound
1 2 -.514* .179 .049 -1.028 -.001
3 -.514* .126 .001 -.874 -.154
4 Field (2013) stated that “sphericity creates loss of power” (p. 547). I suspect this is a typo and he meant to state that “lack of sphericity creates loss of power.
© Charles T. Diebold, Ph.D., 9/15/2013. All Rights Reserved. Page 16 of 16
For the quiz1:quiz2 comparison, t(103) = -.514 ÷ .179 = -2.872, whereas for the quiz1:quiz3 comparison, t(103) = -.514 ÷ .126 = 4.079. The t value is larger for the quiz1:quiz3 comparison because you are dividing the same mean difference (-.514) by a smaller standard error. And, for the same N, a larger t value has a smaller p value. What does it mean that the quiz1:quiz3 difference had a smaller standard error than the quiz1:quiz2 difference? If you create a new variable, such as q1minusq2, by subtracting the quiz2 scores from the quiz1 scores, and do similar to create q1minusq3, we can look at the descriptive statistics for each of the two newly created variables. The standard error (SE) is a function of the standard deviation (SD) and the sample size (N), such that
N SDSE = . Because N = 105 for both variables, the issue boil downs to differences in the standard deviation.
For the same mean difference and N, the variable with the smaller standard deviation, in this case, q1minusq3, will have a larger t value and smaller p value.
Descriptive Statistics N Mean Std. Deviation Variance
Statistic Statistic Std. Error Statistic Statistic
q1minusq2 105 -.5143 .17909 1.83510 3.368
q1minusq3 105 -.5143 .12559 1.28687 1.656
Valid N (listwise) 105 Conceptually, the differences in quiz1 scores and quiz3 scores were more homogeneous (less spread out) and the quiz1 and quiz3 scores, themselves, were more highly correlated (r[103] = .858), than the quiz1 and quiz2 scores (r[103] = .673. This is visually apparent in the scatterplots below. The quiz1:quiz2 plot on the left is more scattered than the quiz1:quiz3 plot on the right. So, the mystery of why quiz2 and quiz3 had equal mean and equal mean difference from quiz1, but the quiz1:quiz2 pairwise p = .049 and the quiz1:quiz3 pairwise p = .001 is solved!