• Tidak ada hasil yang ditemukan

Interpretation of the Correlation Coefficient

First let us consider a simple example that illustrates the concept of a perfect positive linear relationship. The scatter diagram in Figure 3.11 depicts the relationship between xand y based on the following sample data.

xi yi

5 10

10 30

15 50

50

40

30

20

10 y

5 10 15 x

FIGURE 3.11 SCATTER DIAGRAM DEPICTING A PERFECT POSITIVE LINEAR RELATIONSHIP

3.5 Measures of Association Between Two Variables 121

The straight line drawn through each of the three points shows a perfect linear rela- tionship between xand y. In order to apply equation (3.12) to compute the sample correla- tion we must first compute sxy, sx, and sy. Some of the computations are shown in Table 3.8.

Using the results in this table, we find

Thus, we see that the value of the sample correlation coefficient is 1.

In general, it can be shown that if all the points in a data set fall on a positively sloped straight line, the value of the sample correlation coefficient is⫹1; that is, a sample corre- lation coefficient of⫹1 corresponds to a perfect positive linear relationship betweenxand y. Moreover, if the points in the data set fall on a straight line having negative slope, the value of the sample correlation coefficient is⫺1; that is, a sample correlation coefficient of⫺1 corresponds to a perfect negative linear relationship betweenxandy.

Let us now suppose that a certain data set indicates a positive linear relationship be- tween xand ybut that the relationship is not perfect. The value of rxywill be less than 1, indicating that the points in the scatter diagram are not all on a straight line. As the points deviate more and more from a perfect positive linear relationship, the value of rxybecomes smaller and smaller. A value of rxyequal to zero indicates no linear relationship between x and y, and values of rxynear zero indicate a weak linear relationship.

For the data involving the stereo and sound equipment store, rxy⫽.93. Therefore, we conclude that a strong positive linear relationship occurs between the number of commer- cials and sales. More specifically, an increase in the number of commercials is associated with an increase in sales.

In closing, we note that correlation provides a measure of linear association and not necessarily causation. A high correlation between two variables does not mean that changes in one variable will cause changes in the other variable. For example, we may find that the quality rating and the typical meal price of restaurants are positively corre- lated. However, simply increasing the meal price at a restaurant will not cause the quality rating to increase.

rxysxy

sxsy⫽ 100 5(20) ⫽1

sy

兺(nyi1y¯)2

8002 20

sx

兺(xni1x¯)2

502 5

sxy⫽ 兺(xix¯)(yiy¯)

n⫺1 ⫽ 200

2 ⫽100

xi yi ( )2 ( )2 ( )( )

5 10 ⫺5 25 ⫺20 400 100

10 30 0 0 0 0 0

15 50 5 25 20 400 100

Totals 30 90 0 50 0 800 200

y¯30 x¯10

yiy¯ xix¯ yiy¯

yiy¯ xix¯

xix¯

TABLE 3.8 COMPUTATIONS USED IN CALCULATING THE SAMPLE CORRELATION COEFFICIENT

The correlation coefficient ranges from 1 to 1.

Values close to 1 or 1 indicate a strong linear relationship. The closer the correlation is to zero, the weaker the relationship.

Exercises

Methods

45. Five observations taken for two variables follow.

xi 4 6 11 3 16

yi 50 50 40 60 30 a. Develop a scatter diagram with xon the horizontal axis.

b. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables?

c. Compute and interpret the sample covariance.

d. Compute and interpret the sample correlation coefficient.

46. Five observations taken for two variables follow.

xi 6 11 15 21 27 yi 6 9 6 17 12 a. Develop a scatter diagram for these data.

b. What does the scatter diagram indicate about a relationship between xand y?

c. Compute and interpret the sample covariance.

d. Compute and interpret the sample correlation coefficient.

Applications

47. Nielsen Media Research provides two measures of the television viewing audience: a tele- vision program rating,which is the percentage of households with televisions watching a program, and a television program share,which is the percentage of households watching a program among those with televisions in use. The following data show the Nielsen tele- vision ratings and share data for the Major League Baseball World Series over a nine-year period (Associated Press, October 27, 2003).

Rating 19 17 17 14 16 12 15 12 13 Share 32 28 29 24 26 20 24 20 22 a. Develop a scatter diagram with rating on the horizontal axis.

b. What is the relationship between rating and share? Explain.

c. Compute and interpret the sample covariance.

d. Compute the sample correlation coefficient. What does this value tell us about the relationship between rating and share?

48. A department of transportation’s study on driving speed and miles per gallon for midsize automobiles resulted in the following data:

Speed (Miles per Hour) 30 50 40 55 30 25 60 25 50 55 Miles per Gallon 28 25 25 23 30 32 21 35 26 25 Compute and interpret the sample correlation coefficient.

49. At the beginning of 2009, the economic downturn resulted in the loss of jobs and an in- crease in delinquent loans for housing. The national unemployment rate was 6.5% and the percentage of delinquent loans was 6.12% (The Wall Street Journal,January 27, 2009). In projecting where the real estate market was headed in the coming year, economists stud- ied the relationship between the jobless rate and the percentage of delinquent loans.

The expectation was that if the jobless rate continued to increase, there would also be an

test

SELF

3.5 Measures of Association Between Two Variables 123 increase in the percentage of delinquent loans. The data below show the jobless rate and the delinquent loan percentage for 27 major real estate markets.

a. Compute the correlation coefficient. Is there a positive correlation between the jobless rate and the percentage of delinquent housing loans? What is your interpretation?

b. Show a scatter diagram of the relationship between jobless rate and the percentage of delinquent housing loans.

50. The Dow Jones Industrial Average (DJIA) and the Standard & Poor’s 500 Index (S&P 500) are both used to measure the performance of the stock market. The DJIA is based on the price of stocks for 30 large companies; the S&P 500 is based on the price of stocks for 500 companies. If both the DJIA and S&P 500 measure the performance of the stock market, how are they correlated? The following data show the daily percent increase or daily percent decrease in the DJIA and S&P 500 for a sample of nine days over a three-month period (The Wall Street Journal,January 15 to March 10, 2006).

a. Show a scatter diagram.

b. Compute the sample correlation coefficient for these data.

c. Discuss the association between the DJIA and S&P 500. Do you need to check both before having a general idea about the daily stock market performance?

51. The daily high and low temperatures for 14 cities around the world are shown (The Weather Channel, April 22, 2009).

DJIA .20 .82 .99 .04 .24 1.01 .30 .55 .25

S&P 500 .24 .19 .91 .08 .33 .87 .36 .83 .16

City High Low City High Low

Athens 68 50 London 67 45

Beijing 70 49 Moscow 44 29

Berlin 65 44 Paris 69 44

Cairo 96 64 Rio de Janeiro 76 69

Dublin 57 46 Rome 69 51

Geneva 70 45 Tokyo 70 58

Hong Kong 80 73 Toronto 44 39

Jobless Delinquent Jobless Delinquent Metro Area Rate (%) Loan (%) Metro Area Rate (%) Loan (%)

Atlanta 7.1 7.02 New York 6.2 5.78

Boston 5.2 5.31 Orange County 6.3 6.08

Charlotte 7.8 5.38 Orlando 7.0 10.05

Chicago 7.8 5.40 Philadelphia 6.2 4.75

Dallas 5.8 5.00 Phoenix 5.5 7.22

Denver 5.8 4.07 Portland 6.5 3.79

Detroit 9.3 6.53 Raleigh 6.0 3.62

Houston 5.7 5.57 Sacramento 8.3 9.24

Jacksonville 7.3 6.99 St. Louis 7.5 4.40

Las Vegas 7.6 11.12 San Diego 7.1 6.91

Los Angeles 8.2 7.56 San Francisco 6.8 5.57

Miami 7.1 12.11 Seattle 5.5 3.87

Minneapolis 6.3 4.39 Tampa 7.5 8.42

Nashville 6.6 4.78

file

WEB

Housing

file

WEB

StockMarket

file

WEB

WorldTemp

a. What is the sample mean high temperature?

b. What is the sample mean low temperature?

c. What is the correlation between the high and low temperatures? Discuss.

3.6 The Weighted Mean and Working with Grouped Data

In Section 3.1, we presented the mean as one of the most important measures of central location. The formula for the mean of a sample with nobservations is restated as follows.

(3.14) x¯⫽ 兺xi

nx1x2⫹. . .⫹xn n

Dokumen terkait