432
✪ Graphing Correlations: The Scatter Diagram 434
✪ Patterns of Correlation 437
✪ The Correlation Coefficient 443
✪ Significance of a Correlation Coefficient 452
✪ Correlation and Causality 456
✪ Issues in Interpreting the Correlation Coefficient 458
✪ Effect Size and Power for the Correlation Coefficient 464
This chapter is about a statistical procedure that allows you to look at the rela-tionship between two groups of scores. To give you an idea of what we mean,let’s consider some common real-world examples. Among students, there is a relationship between high school grades and college grades. It isn’t a perfect relation- ship, but generally speaking students with better high school grades tend to get bet- ter grades in college. Similarly, there is a relationship between parents’ heights and the adult height of their children. Taller parents tend to give birth to children who grow up to be taller than the children of shorter parents. Again, the relationship isn’t perfect, but the general pattern is clear. Now we’ll look at an example in detail.
One hundred thirteen married people in the small college town of Santa Cruz, California, responded to a questionnaire in the local newspaper about their marriage. [This was part of a larger study reported by Aron and colleagues (2000).] As part of the questionnaire, they answered the question, “How exciting are the things you do
✪ Controversy: What Is a Large Correlation? 466
✪ Correlation in Research Articles 467
✪ Summary 469
✪ Key Terms 471
✪ Example Worked-Out Problems 471
✪ Practice Problems 474
✪ Using SPSS 482
✪ Chapter Notes 485
Correlation
Chapter Outline
CHAPTER 11
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 433
T I P F O R S U C C E S S You can learn most of the material in this chapter if you have mas- tered Chapters 1 and 2; but if you are reading this before having studied Chapters 3 through 7, you should not try to read the material near the end of this chapter on the significance of a correlation coeffi- cient or on effect size and power.
together with your partner?” using a scale from 1, not exciting at all to 5, extremely exciting. The questionnaire also included a standard measure of marital satisfaction (that included items such as, “In general, how often do you think that things between you and your partner are going well?”).
The researchers were interested in finding out the relationship between doing ex- citing things with a marital partner and the level of marital satisfaction people re- ported. In other words, they wanted to look at the relationship between two groups of scores: the group of scores for doing exciting things and the group of scores for mar- ital satisfaction. As shown in Figure 11–1, the relationship between these two groups of scores can be shown very clearly using a graph. The horizontal axis is for people’s answers to the question, “How exciting are the things you do together with your part- ner?” The vertical axis is for the marital satisfaction scores. Each person’s score on the two variables is shown as a dot.
The overall pattern is that the dots go from the lower left to the upper right. That is, lower scores on the variable “doing exciting activities with your partner” more often go with lower scores on the variable “marital satisfaction,” and higher with higher. So, in general, this graph shows that the more that people did exciting activities with their partner, the more satisfied they were in their marriage. Even though the pattern is far from one to one, you can see a general trend. This general pattern is of high scores on one vari- able going with high scores on the other variable, low scores going with low scores, and mediums with mediums. This is an example of a correlation.
A correlation describes the relationship between two variables. More precisely, the usual measure of a correlation describes the relationship between two equal-interval numeric variables. As you learned in Chapter 1, the differences between values for
Exciting Activities with Partner
60
50
40
30
20
10
0 1 2 3 4 5
M ar
ita l S
at is
fa ct
io n
0
Figure 11–1 Scatter diagram showing the correlation for 113 married individuals be- tween doing exciting activities with their partner and their marital satisfaction. (Data from Aron et al., 2000.)
correlation association between scores on two variables.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
434 Chapter 11
equal-interval numeric variables correspond to differences in the underlying thing being measured. (Most psychologists consider scales like a 1-to-10 rating scale as approx- imately equal-interval scales.) There are countless examples of correlations: in chil- dren, there is a correlation between age and coordination skills; among students, there is a correlation between amount of time studying and amount learned; in the market- place, we often assume that a correlation exists between price and quality—that high prices go with high quality and low with low.
This chapter explores correlation, including how to describe it graphically, dif- ferent types of correlations, how to figure the correlation coefficient (which gives a number for the degree of correlation), the statistical significance of a correlation co- efficient, issues about how to interpret a correlation coefficient, and effect size and power for a correlation coefficient.
Graphing Correlations: The Scatter Diagram Figure 11–1 shows the correlation between exciting activities and marital satisfac- tion and is an example of a scatter diagram (also called a scatterplot). A scatter diagram shows you at a glance the pattern of the relationship between the two variables.
How to Make a Scatter Diagram There are three steps to making a scatter diagram:
❶ Draw the axes and decide which variable goes on which axis. Often, it doesn’t matter which variable goes on which axis. However, sometimes the re- searchers are thinking of one of the variables as predicting or causing the other. In that case, the variable that is doing the predicting or causing goes on the hor- izontal axis and the variable that is being predicted about or caused goes on the vertical axis. In Figure 11–1, we put exciting activities on the horizontal axis and marital satisfaction on the vertical axis. This was because the study was based on a theory that the more the activities that a couple does together are exciting, the more the couple is satisfied with their marriage. (We will have more to say about this later in the chapter when we discuss causality and also in Chapter 12 when we discuss prediction.)
❷ Determine the range of values to use for each variable and mark them on the axes. Your numbers should go from low to high on each axis, starting from where the axes meet. Your low value on each axis should be 0.
Each axis should continue to the highest value your measure can possibly have. When there is no obvious highest possible value, make the axis go to a value that is as high as people ordinarily score in the group of people of interest for your study. Note that scatter diagrams are usually made roughly square, with the horizontal and vertical axes being about the same length (a 1:1 ratio).
❸ Mark a dot for each pair of scores. Find the place on the horizontal axis for the first pair of scores on the horizontal-axis variable. Next, move up to the height for the score for the first pair of scores on the vertical-axis variable. Then mark a clear dot. Continue this process for the remaining pairs of scores. Some- times the same pair of scores occurs twice (or more times). This means that the dots for these pairs would go in the same place. When this happens, you can put a second dot as near as possible to the first—touching, if possible—but making it clear that there are in fact two dots in the one place. Alternatively, you can put the number 2 in that place.
scatter diagram graph showing the relationship between two variables: the values of one variable are along the horizontal axis and the values of the other variable are along the vertical axis; each score is shown as a dot in this two- dimensional space.
T I P F O R S U C C E S S If you’re in any way unsure about what a numeric equal-interval variable is, be sure to review the Chapter 1 material on kinds of variables.
T I P F O R S U C C E S S When making a scatter diagram, it is easiest if you use graph paper.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 435
An Example Suppose a researcher is studying the relationship of sleep to mood. As an initial test, the researcher asks six students in her morning seminar two questions:
1. How many hours did you sleep last night? 2. How happy do you feel right now on a scale from 0, not at all happy, to 8,
extremely happy?
The (fictional) results are shown in Table 11–1. (In practice, a much larger group would be used in this kind of research. We are using an example with just six to keep things simple for learning. In fact, we have done a real version of this study. Results of the real study are similar to what we show here, except not as strong as the ones we made up to make the pattern clear for learning.)
❶ Draw the axes and decide which variable goes on which axis. Because sleep comes before mood in this study, it makes most sense to think of sleep as the predictor. Thus, as shown in Figure 11–2a, we put hours slept on the horizontal axis and happy mood on the vertical axis.
8
7
6
5
4
3
2
1
0
H ap
py M
oo d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night
(d)
H ap
py M
oo d
Hours Slept Last Night
(a)
8
7
6
5
4
3
2
1
0
H ap
py M
oo d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night
(b)
8
7
6
5
4
3
2
1
0
H ap
py M
oo d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night
(c)
❶ ❷
❸
Figure 11–2 Steps for making a scatter diagram. (a) ❶ Draw the axes and decide which variable goes on which axis—the predictor variable (Hours Slept Last Night) on the horizon- tal axis, the other (Happy Mood) on the vertical axis. (b) ❷ Determine the range of values to use for each variable and mark them on the axes. (c) ❸ Mark a dot for the pair of scores for the first student. (d) ❸ continued: Mark dots for the remaining pairs of scores.
Table 11–1 Hours Slept Last Night and Happy Mood Example (Fictional Data)
Hours Slept Happy Mood
7 4
5 2
8 7
6 2
6 3
10 6
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
How are you doing?
1. What does a scatter diagram show, and what does it consist of? 2. (a) When it is the kind of study in which one variable can be thought of as pre-
dicting another variable, which variable goes on the horizontal axis? (b) Which goes on the vertical axis?
3. Make a scatter diagram for the following scores for four people who were each tested on two variables, X and Y. X is the variable we are predicting from; it can have scores ranging from 0 to 6. Y is the variable being predicted; it can have scores from 0 to 7.
0123456
Y
X
7
6
5
4
3
2
1
0
Figure 11–3Scatter diagram for scores in “How are you doing?” question 3.
436 Chapter 11
❷ Determine the range of values to use for each variable and mark them on the axes. For the horizontal axis, we start at 0 as usual. We do not know the maxi- mum possible, but let us assume that students rarely sleep more than 12 hours. The vertical axis goes from 0 to 8, the lowest and highest scores possible on the happiness question. See Figure 11–2b.
❸ Mark a dot for each pair of scores. For the first student, the number of hours slept last night was 7. Move across to 7 on the horizontal axis. The happy mood rating for the first student was 4, so move up to the point across from the 4 on the vertical axis. Place a dot at this point, as shown in Figure 11–2c. Do the same for each of the other five students. The result should look like Figure 11–2d.
Person X Y
A 3 4 B 6 7 C 1 2 D 4 6
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Patterns of Correlation Linear and Curvilinear Correlations In each example so far, the pattern in the scatter diagram very roughly approximates a straight line. Thus, each is an example of a linear correlation. In the scatter diagram for the study of happy mood and sleep (Figure 11–2d), you could draw a line show- ing the general trend of the dots, as we have done in Figure 11–4. Notice that the scores do not all fall right on the line. Notice, however, that the line does describe the general tendency of the scores. (In Chapter 12 you learn the precise rules for draw- ing such a line.)
Sometimes, however, the general relationship between two variables does not follow a straight line at all, but instead follows the more complex pattern of a curvilinear correlation. Consider, for example, the relationship between a person’s level of kindness and the degree to which that person is desired by others as a poten- tial romantic partner. There is evidence suggesting that, up to a point, a greater level of kindness increases a person’s desirability as a romantic partner. However, beyond that point, additional kindness does little to increase desirability (Li et al., 2002). This particular curvilinear pattern is shown in Figure 11–5. Notice that you could not draw a straight line to describe this pattern. Some other examples of curvilinear relation- ships are shown in Figure 11–6.
Correlation 437
linear correlation relation between two variables that shows up on a scatter diagram as the dots roughly following a straight line.
curvilinear correlation relation be- tween two variables that shows up on a scatter diagram as dots following a sys- tematic pattern that is not a straight line.
Answers
1.A scatter diagram is a graph that shows the relation between two variables. One axis is for one variable; the other axis, for the other variable. The graph has a dot for each individual’s pair of scores. The dot for each pair is placed above that of the score for that pair on the horizontal axis variable and directly across from the score for that pair on the vertical axis variable.
2.(a) The variable that is doing the predicting goes on the horizontal axis. (b) The variable that is being predicted goes on the vertical axis.
3.See Figure 11–3.
8
7
6
5
4
3
2
1
0
H ap
py M
oo d
0 1 2 3 4 5 6 7 8 9 10 11 12
Hours Slept Last Night
Figure 11–4 Scatter diagram from Figure 11–2d with a line drawn to show the general trend.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
438 Chapter 11
Kindness
D es
ir ab
ili ty
Figure 11–5 Example of a curvilinear relationship: desirability and kindness.
Pe rc
en t W
ho R
em em
be r
E ac
h It
em
Beginning Middle End
Position of Item in the List
(b)
Fe el
in g
0
Stimulus Complexity
(a)
R at
e of
S ub
st itu
tio n
of D
ig its
f or
S ym
bo ls
0 1 2 3 4
Motivation
(c)
5
+
−
Simple, familiar
Simple, complex, novel, familiar
Complex, novel
Figure 11–6 Examples of curvilinear relationships: (a) the way we feel and the complex- ity of a stimulus; (b) the number of people who remember an item and its position on a list; and (c) children’s rate of and motivation for substituting digits for symbols.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 439
no correlation no systematic relation- ship between two variables.
In co
m e
Shoe Size
Figure 11–7 Two variables with no association with each other: income and shoe size (fictional data).
The usual way of figuring the correlation (the one you learn shortly in this chap- ter) gives the degree of linear correlation. If the true pattern of association is curvi- linear, figuring the correlation in the usual way could show little or no correlation. Thus, it is important to look at scatter diagrams to identify these richer relationships rather than automatically figuring correlations in the usual way, assuming that the only relationship is a straight line.
No Correlation It is also possible for two variables to be essentially unrelated to each other. For ex- ample, if you were to do a study of income and shoe size, your results might appear as shown in Figure 11–7. The dots are spread everywhere, and there is no line, straight or otherwise, that is any reasonable representation of a trend. There is simply no correlation.
Positive and Negative Linear Correlations In the examples so far of linear correlations, such as exciting activities and martial sat- isfaction, high scores go with high scores, lows with lows, and mediums with medi- ums. This is called a positive correlation. (One reason for the term “positive” is that in geometry, the slope of a line is positive when it goes up and to the right on a graph like this. Notice that in Figure 11–4 the positive correlation between happy mood and sleep is shown by a line that goes up and to the right.)
Sometimes, however, high scores on one variable go with low scores on the other variable and lows with highs. This is called a negative correlation. For example, in the newspaper survey about marriage, the researchers also asked about boredom with the relationship and the partner. Not surprisingly, the more bored a person was, the lower was the person’s marital satisfaction. That is, low scores on one variable went with high scores on the other. Similarly, the less bored a person was, the higher the marital satisfaction. This is shown in Figure 11–8, where we also put in a line to em- phasize the general trend. You can see that as it goes from left to right, the line slopes slightly downward.
Another example of a negative correlation is from organizational psychology. A well established finding in that field is that absenteeism from work has a negative
negative correlation relation between two variables in which high scores on one go with low scores on the other, mediums with mediums, and lows with highs; on a scatter diagram, the dots roughly follow a straight line sloping down and to the right.
positive correlation relation between two variables in which high scores on one go with high scores on the other, mediums with mediums, and lows with lows; on a scatter diagram, the dots roughly follow a straight line sloping up and to the right.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
440 Chapter 11
Bored with Relationship
60
50
40
30
20
10
0 2 4 6 10 12
M ar
ita l S
at is
fa ct
io n
80
Figure 11–8 Scatter diagram with the line drawn in to show the general trend for a neg- ative correlation between two variables: greater boredom with the relationship goes with lower marital satisfaction. (Data from Aron et al., 2000.)
linear correlation with satisfaction with the job (e.g., Mirvis & Lawler, 1977): that is, the higher the level of job satisfaction, the lower the level of absenteeism. Put another way, the lower the level of job satisfaction is, the higher the absenteeism be- comes. Research on this topic has continued to show this pattern all over the world (e.g., Punnett et al., 2007), and the same pattern is found for university classes: the more satisfied students are, the less they miss class (Yorges et al., 2007).
Strength of the Correlation What we mean by the strength of the correlation is how much there is a clear pat- tern of some particular relationship between two variables. For example, we saw that a positive linear correlation is when high scores go with highs, mediums with mediums, lows with lows. The strength (or degree) of such a correlation, then, is how much highs go with highs, and so on. Similarly, the strength of a negative lin- ear correlation is how much the highs on one variable go with the lows on the other, and so forth. In terms of a scatter diagram, there is a “large” (or “strong”) linear correlation if the dots fall close to a straight line (the line sloping up or down depending on whether the linear correlation is positive or negative). A perfect lin- ear correlation means all the dots fall exactly on the straight line. There is a “small” (or “weak”) correlation when you can barely tell there is a correlation at all; the dots fall far from a straight line. The correlation is “moderate” (also called a “medium” correlation) if the pattern of dots is somewhere between a small and a large correlation.
Importance of Identifying the Pattern of Correlation The procedure you learn in the next main section is for figuring the direction and strength of linear correlation. As we suggested earlier, the best approach to such a problem is first to make a scatter diagram and to identify the pattern of correla- tion. If the pattern is curvilinear, then you would not go on to figure the linear correlation. This is important because figuring the linear correlation when the
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 441
true correlation is curvilinear would be misleading. (For example, you might con- clude that there is little or no correlation when in fact there is a quite strong rela- tionship; it is just not linear.) You should assume that the correlation is linear, unless the scatter diagram shows a curvilinear correlation. We say this, because when the linear correlation is small, the dots will fall far from a straight line. In such situations, it can sometimes be hard to imagine a straight line that roughly shows the pattern of dots.
If the correlation appears to be linear, it is also important to “eyeball” the scatter diagram a bit more. The idea is to note the direction (positive or negative) of the lin- ear correlation and also to make a rough guess as to the strength of the correlation. Scatter diagrams with varying directions and strengths of correlation are shown in Figure 11–9. For example, scatter diagram (a) in Figure 11–9 shows a large positive correlation, because the dots fall relatively close to a straight line, with low scores
(a) (b)
(c) (d)
(e) (f)
Figure 11–9 Examples of scatter diagrams with different degrees of correlation.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
442 Chapter 11
How are you doing?
1. What is the difference between a linear and curvilinear correlation in terms of how they appear in a scatter diagram?
2. What does it mean to say that two variables have no correlation? 3. What is the difference between a positive and negative linear correlation?
Answer this question in terms of (a) the patterns in a scatter diagram and (b) what those patterns tell you about the relationship between the two variables.
4. For each of the scatter diagrams shown in Figure 11–10, say whether the pat- tern is roughly linear, curvilinear, or no correlation. If the pattern is roughly lin- ear, also say if it is positive or negative, and whether it is large, moderate, or small.
5. Give two reasons why it is important to identify the pattern of correlation in a scatter diagram before proceeding to figure the precise correlation.
going with low scores and highs with highs. Scatter diagram (d), however, shows a negative correlation (there is a general tendency for lows to be with highs and highs with lows) that is of a moderate size (the dots fall too far from a straight line to be a large correlation, but are not so far apart that it is a small correlation). Using a scat- ter diagram to examine the direction and approximate strength of correlation is im- portant because it lets you check to see whether you have made a major mistake when you then do the figuring you learn in the next section.
(a)
(c)
(b)
(d)
Figure 11–10 Scatter diagrams for “How are you doing?” question 4.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 443
product of deviation scores the result of multiplying the deviation score on one variable by the deviation score on another variable.
The Correlation Coefficient Looking at a scatter diagram gives you a rough idea of the relationship between two variables, but it is not a very precise approach. What you need is a number that gives the exact correlation (in terms of its direction and strength).
Logic of Figuring the Linear Correlation A linear correlation (when it is positive) means that highs go with highs and lows with lows. Thus, the first thing you need in figuring the correlation is some consis- tent way to measure what is a high score and what is a low score. An efficient way to solve this problem is to use deviation scores—that is, the raw score minus the mean ( for one variable and for the other variable). A raw score above the mean (that is, a high score) will always give a positive deviation score and a raw score below the mean (that is, a low score) will always give a nega- tive deviation score.
There is an additional and very important reason why deviation scores are so use- ful when figuring the correlation. It has to do with what happens if you multiply a score on one variable by a score on the other variable and get the product. When using deviation scores, this is called a product of deviation scores (or product of deviations). If you multiply a positive deviation score on one variable by a positive deviation score on another variable (each positive deviation score represents a raw score above the mean), you will always get a positive product. Further—and here is where it gets in- teresting—if you multiply a negative deviation score by a negative deviation score (each negative deviation score represents a raw score below the mean), you also get a positive product.
Y - MYdeviation scores = X - MX
Answers
1.In a linear correlation, the pattern of dots roughly follows a straight line (al- though with a small correlation, the dots will be spread widely around a straight line); in a curvilinear correlation, there is a clear systematic pattern to the dots, but it is not a straight line.
2.Two variables have no correlation when there is no pattern of relationship between them.
3.(a) In a scatter diagram for a positive linear correlation, the line that roughly describes the pattern of dots goes up and to the right; in a negative linear cor- relation, the line goes down and to the right. (b) In a positive linear correlation, the basic pattern is that high scores on one variable go with high scores on the other, mediums go with mediums, and lows go with lows; in a negative linear correlation, high scores on one variable go with low scores on the other, medi- ums go with mediums, and lows go with highs.
4.In Figure 11–10: (a) linear, negative, large; (b) curvilinear; (c) linear, positive, large; (d) no correlation.
5.Identifying whether the pattern of correlation in a scatter diagram is linear tells you whether it is appropriate to use the standard procedures for figuring a lin- ear correlation. If it is linear, identifying the direction and approximate strength of correlation before doing the figuring lets you check the results of your figur- ing when you are done.
IS B
N 0-
55 8-
46 76
1- X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
444 Chapter 11
So, if highs on one variable go with highs on the other, and lows on one go with lows on the other, the products of deviation scores always will be positive. Consid- ering a whole distribution of scores, suppose you take each person’s deviation score on one variable and multiply it by that person’s deviation score on the other variable. The result of doing this when highs go with highs and lows with lows is that the products all come out positive. If you sum up these products of deviation scores for all the people in the study, which are all positive, you will end up with a big positive number.
On the other hand, with a negative correlation, highs go with lows and lows with highs. In terms of deviation scores, this would mean positives with negatives and negatives with positives. Multiplied out, that gives all negative products of de- viations scores. If you add all these negative products together, you get a big nega- tive number.
Finally, suppose there is no linear correlation. In this situation, for some people highs on one variable would go with highs on the other variable (and some lows would go with lows), making positive products of deviations. For other people, highs on one variable would go with lows on the other variable (and some lows would go with highs), making negative products. Adding up these products for all the people in the study would result in the positive products and the negative products canceling each other out, giving a result around 0.
In each situation, we changed all the scores to deviation scores, multiplied the two deviation scores for each person by each other, and added up these products of devi- ations. The result was a large positive number if there was a positive linear correla- tion, a large negative number if there was a negative linear correlation, and 0 if there was no linear correlation.
Table 11–2 summarizes the logic up to this point. The table shows the effect on the correlation of different patterns of raw scores and resulting deviation scores. For example, the first row shows a high score on X going with a high score on Y. In this situation, the deviation score for variable X is a positive number (since X is a high number, above the mean of X ), and similarly the deviation score for variable Y is a positive number (since Y is a high number, above the mean of Y ). Thus, the product of these two positive deviation scores must be a positive number (since a positive number multiplied by a positive number always gives a positive number). The overall
Table 11–2 The Effect on the Correlation of Different Patterns of Raw Scores and Deviation Scores
Product of Pair of Scores Deviation Scores Deviation Scores
Effect on CorrelationX Y
High High Contributes to positive correlation
Low Low Contributes to positive correlation
High Low Contributes to negative correlation
Low High Contributes to negative correlation
Middle Any Zero , , or Zero Zero Makes correlation near zero
Any Middle , , or Zero Zero Zero Makes correlation near zero
Note: indicates a positive number; indicates a negative number-+
-+ -+
-+- --+ +-- +++
(X � MX)(Y � MY )Y � MYX � MX
T I P F O R S U C C E S S Test your understanding of correla- tion by covering up portions of Table 11–2 and trying to recall the hidden information.
IS B
N 0-558-46761-X
Statistics for Psychology, Fifth Edition, by Arthur Aron, Elaine N. Aron, and Elliot J. Coups. Published by Prentice Hall. Copyright © 2009 by Pearson Education, Inc.
Correlation 445
effect is that when a high score on X goes with a high score on Y, the pair of scores contribute toward making a positive correlation. The table shows that positive prod- ucts of deviation scores contribute toward making a positive correlation, negative products of deviation scores contribute toward making a negative correlation, and products of deviation scores that are zero (or close to zero) contribute toward making a correlation of zero.
However, you are still left with the problem of figuring the precise strength of a positive or negative correlation. The larger the number is (that is, the farther from zero), the stronger the correlation will be. But how large is large, and how large is not very large? You can’t judge from the sum of the products of deviations alone, which gets bigger just by adding the products of more persons together. For exam- ple, a study with 100 people would have a larger sum of products of deviations than the same study with only 25 people. The sum of the products also gets larger if the scores are on a more spread-out scale. For example, a study in which the scores on the two variables have a lot of variation, so they range from, say, 0 to 50, will have much larger products of deviation scores (and thus a larger sum of the products) than a study in which the scores on the two variables have less variation and range from, say, 0 to 10. This is because you are multiplying larger deviation scores by each other.
The upshot of all this is the sign ( or ) of the sum of the products of devia- tion scores tells you the direction of the correlation. And the bigger it is (ignoring the sign), the more positive or negative it is. But it is hard to know from the sum of the products of deviation scores just how strong the correlation is because the number of people in the study and the amount of variation of the scores for each variable both affect the size of the sum of the products of deviation scores.
The solution to finding the precise degree of correlation is to divide this sum of the products of deviations by a number that corrects for both the number of people in the study and the variation of the scores for each variable. It turns out that this num- ber is based on the sum of the squared deviations of each variable. This is because the more people there are in the study, the more squared deviations are being summed and because the more variation there is in the scores for each variable, the larger will be the squared deviations being summed. That is, to adjust our sum of products, we use a correction number that has two properties: