COMPUTING THE CORRELATION COEFFICIENT

PART VIII. TESTS OF SIGNIFICANCE

4. COMPUTING THE CORRELATION COEFFICIENT

Here is the procedure for computing the correlation coefficient.

Convert each variable to standard units. The average of the products gives the correlation coefficient.

(Standard units were discussed on pp. 79–80.) This procedure can be given as a formula, wherexstands for the first variable,yfor the second variable, andrfor the correlation coefficient:

r=average of(xin standard units)×(yin standard units).

Example 1. Computerfor the hypothetical data in table 1.

Table 1. Data.

x y

1 5

3 9

4 7

5 1

7 13

Note. The first row of table 1 represents two measurements on one subject in the study; the two numbers are thex- andy-coordinates of the corresponding point on the scatter diagram. Similarly for the other rows. The pairing matters:

r is defined only when you have two variables, and both are measured for every subject in the study.

Solution. The work can be laid out as in table 2.

Step 1. Convert thex-values to standard units, as in chapter 5. This is quite a lot of work. First, you have to find the average and SD of thex-values:

average ofx-values=4, SD=2.

Then, you have to subtract the average from eachx-value, and divide by the SD:

1−4

2 =−1.5 3−4

2 =−0.5 4−4

2 =0 5−4

2 =0.5 7−4 2 =1.5

COMPUTING THE CORRELATION COEFFICIENT 133

Table 2. Computingr.

x in standard y in standard

x y units units Product

1 5 −1.5 −0.5 0.75

3 9 −0.5 0.5 −0.25

4 7 0.0 0.0 0.00

5 1 0.5 −1.5 −0.75

7 13 1.5 1.5 2.25

The results go into the third column of table 2. The numbers tell you how far above or below average thex-values are, in terms of the SD. For instance, the value 1 is 1.5 SDs below average.

Step 2. Convert they-values to standard units; the results go into the fourth column of the table. That finishes the worst of the arithmetic.

Step 3. For each row of the table, work out the product (xin standard units)×(yin standard units) The products go into the last column of the table.

Step 4. Take the average of the products:

r=average of(xin standard units)×(yin standard units)

= 0.75−0.25+0.00−0.75+2.25

5 =0.40

This completes the solution. If you plot a scatter diagram for the data (figure 9a), the points slope up but are only loosely clustered.

Why doesrwork as a measure of association? In figure 9a, the products are marked at the corresponding dots. Horizontal and vertical lines are drawn through the point of averages, dividing the scatter diagram into four quadrants. If a point is in the lower left quadrant, both variables are below average and are negative in

Figure 9. How the correlation coefficient works.

standard units; the product of two negatives is positive. In the upper right quadrant, the product of two positives is positive. In the remaining two quadrants, the product of a positive and a negative is negative. The average of all these products is the correlation coefficient. Ifris positive, then points in the two positive quadrants will predominate, as in figure 9b. Ifr is negative, points in the two negative quadrants will predominate, as in figure 9c.

Exercise Set D

1. For each of the data sets shown below, calculater.

(a) (b) (c)

x y x y x y

1 6 1 2 1 7

2 7 2 1 2 6

3 5 3 4 3 5

4 4 4 3 4 4

5 3 5 7 5 3

6 1 6 5 6 2

7 2 7 6 7 1

2. Find the scatter diagram in figure 6 (p. 127) with a correlation of 0.95. In this diagram, the percentage of points where both variables are simultaneously above average is around

5% 25% 50% 75% 95%.

3. Repeat exercise 2, for a correlation of 0.00.

4. Using figure 7, repeat exercise 2 for a correlation of−0.95.

The answers to these exercises are on p. A57.

Technical note. There is another way to computer, which is sometimes useful:⁸

r = cov(x,y) (SD ofx)×(SD ofy) where

cov(x,y)=(average of productsx y)−(ave ofx)×(ave ofy).

5. REVIEW EXERCISES

Review exercises may cover material from previous chapters.

1. A study of the IQs of husbands and wives obtained the following results:

for husbands, average IQ=100, SD=15 for wives, average IQ=100, SD=15

r= 0.6

REVIEW EXERCISES 135

One of the following is a scatter diagram for the data. Which one? Say briefly why you reject the others.

2. (a) For a representative sample of cars, would the correlation between the age of the car and its gasoline economy (miles per gallon) be positive or negative?

(b) The correlation between gasoline economy and income of owner turns out to be positive.⁹How do you account for this positive association?

3. Suppose men always married women who were exactly 8% shorter. What would the correlation between their heights be?

4. Is the correlation between the heights of husbands and wives in the U.S.

around−0.9,−0.3, 0.3, or 0.9? Explain briefly.

5. Three data sets are collected, and the correlation coefficient is computed in each case. The variables are

(i) grade point average in freshman year and in sophomore year (ii) grade point average in freshman year and in senior year (iii) length and weight of two-by-four boards

Possible values for correlation coefficients are

−0.50 0.0 0.30 0.60 0.95

Match the correlations with the data sets; two will be left over. Explain your choices.

6. In one class, the correlation between scores on the final and the midterm was 0.50, while the correlation between the scores on the final and the homework was 0.25. True or false, and explain: the relationship between the final scores and the midterm scores is twice as linear as the relationship between the final scores and the homework scores.

7. The figure below has six scatter diagrams for hypothetical data. The correlation coefficients, in scrambled order, are:

−0.85 −0.38 −1.00 0.06 0.97 0.62 Match the scatter diagrams with the correlation coefficients.

REVIEW EXERCISES 137

8. A longitudinal study of human growth was started in 1929 at the Berkeley Institute of Human Development.¹⁰ The scatter diagram below shows the heights of 64 boys, measured at ages 4 and 18.

(a) The average height at age 4 is around 38 inches 42 inches 44 inches (b) The SD of height at age 18 is around

0.5 inches 1.0 inches 2.5 inches (c) The correlation coefficient is around

0.50 0.80 0.95

(d) Which is the SD line—solid or dashed?

Explain your answers.

9. Find the correlation coefficient for each of the three data sets shown below.

(a) (b) (c)

x y x y x y

1 5 1 1 1 2

1 3 1 2 1 2

1 5 1 1 1 2

1 7 1 3 1 2

2 3 2 1 2 4

2 3 2 4 2 4

2 1 2 1 2 4

3 1 3 2 3 6

4 1 4 3 4 8

10. In a large psychology study, each subject took two IQ tests (form L and form M of the Stanford-Binet). A scatter diagram for the test scores is sketched at the top of the next page. You are trying to predict the score on

form M from the score on form L. Each prediction is off by some amount.

On the whole, will these prediction errors be smaller when the score on form L is 75, or 125? or is it about the same for both?

11. A teaching assistant gives a quiz with 10 questions and no part credit. After grading the papers, the TA writes down for each student the number of questions the student got right and the number wrong. The average number of right answers is 6.4 with an SD of 2.0; the average number of wrong answers is 3.6 with the same SD of 2.0. The correlation between the number of right answers and the number of wrongs is

0 −0.50 +0.50 −1 +1 can’t tell without the data Explain.

Figure for exercise 12

SUMMARY 139

12. Fifteen students in an elementary statistics course at U.C. Berkeley were asked to count the dots in a figure like the one at the bottom of the previous page; there were 85 dots in the figure. The counts are shown in the table below. Make a scatter diagram for the counts. Represent each student by one point on your diagram, showing the first and second count. Label both your axes fully. Choose the scale so you can see the pattern in the points. Use your scatter diagram to answer the following questions:

(a) Did the students work independently?

(b) True or false: those students who counted high the first time also tended to be high the second time.

The two counts 1st 2nd

91 85

81 83

86 85

83 84

85 85

85 84

85 89

84 83

91 82

85 85

87 85

90 85

Dalam dokumen statistics-fourth-edition-0393929728-9780393929720 compress (1) (Halaman 150-157)