z-Score
2.4 M EASURES OF R ELATIONSHIPS B ETWEEN V ARIABLES
84 Chapter 2 Using Numerical Measures to Describe Data
2.4 Measures of Relationships Between Variables 85 It can be shown that the correlation coefficient ranges from -1 to +1. The closer r is to +1, the closer the data points are to an increasing straight line, indicating a positive linear relationship. The closer r is to -1, the closer the data points are to a decreasing straight line, indicating a negative linear relationship. When r = 0, there is no linear relationship between x and y—but not necessarily a lack of relationship. In Chapter 1 we presented scatter plots as a graphical measure to determine relationship. Figure 2.4 presents some examples of scatter plots and their corresponding correlation coefficients. Figure 2.5 is a plot of quarterly sales for a major retail company.
Note that sales vary by quarter of the year, reflecting consumers’ purchasing patterns.
The correlation coefficient between the time variable and quarterly sales is zero. However, we can see a very definite seasonal relationship, but the relationship is not linear.
Figure 2.4 Scatter Plots and Correlation
yi
xi (a) r = –.8
yi
xi (c) r = 0
yi
xi (e) r = .8
yi
xi (f) r = 1.0
yi
xi (d) r = .4
(b) r = –.4 yi
xi
25 20 15 10 5
02003 2004 2005 2006 2007 2008 2009 2010 2011 Year and Quarter
Total Sales
Figure 2.5 Retail Sales by Quarter
Example 2.19 Facebook Posts and Interactions (Covariance and Correlation Coefficient)
RELEVANT Magazine (a culture magazine) keeps in touch and informs their readers by posting updates through various social networks. These updates take up a large part of both the marketing and editorial teams’ time. Because these updates take so much time, marketing is interested in knowing whether reducing posts (updates) on Facebook (a specific site) will also lessen their fan interaction; if not, both departments may pursue using their time in more productive ways. The weekly number of posts (updates) and fan interactions for Facebook during a 9-week period are recorded in Table 2.10. Com-pute the covariance and correlation between Facebook posts (site updates) and fan in-teractions. The data are stored in the data file RELEVANT Magazine.
86 Chapter 2 Using Numerical Measures to Describe Data
Table 2.10 Facebook Posts (site updates) and Fan Interactions
Facebook posts (updates), x 16 31 27 23 15 17 17 18 14
Fan interactions, y 165 314 280 195 137 286 199 128 462
Solution The computation of covariance and correlation between Facebook posts (site updates) and fan interactions are illustrated in Table 2.11. The mean and the variance in the number of Facebook posts are found to be approximately
x = 19.8 and s2x = a
n
i=11xi - x22
n - 1 = 34.694
and the mean and the variance in the number of fan interactions are found to be approximately
y = 240.7 and s2y = a
n
i=11yi - y22
n - 1 = 11,369.5
Table 2.11 Facebook Posts and Fan Interactions (Covariance and Correlation)
x y 1xi - x2 1xi- x22 1yi- y2 1yi - y22 1xi- x2 1yi - y2
16 165 -3.8 14.44 -75.7 5,730.49 287.66
31 314 11.2 125.44 73.3 5,372.89 820.96
27 280 7.2 51.84 39.3 1,544.49 282.96
23 195 3.2 10.24 -45.7 2,088.49 -146.24
15 137 -4.8 23.04 -103.7 10,753.69 497.76
17 286 -2.8 7.84 45.3 2,052.09 -126.84
17 199 -2.8 7.84 -41.7 1,738.89 116.76
18 128 -1.8 3.24 -112.7 12,701.29 202.86
14 462 -5.8 33.64 221.3 48,973.69 -1,283.54
x = 19.8 y = 240.7 g = 652.34
From Equation 2.24,
Cov1x, y2 = sxy = a
n
i=11xi - x)1yi - y2
n - 1 = 652.34
8 = 81.542 From Equation 2.26,
r = Cov1x, y2
sxsy = 81.542
234.694211,369.5 = 0.1298 From Equation 2.27
兩 0.1298 兩 6 2
29 = 0.67
We conclude that there is not sufficient data to think that there is a strong linear rela-tionship between Facebook posts and fan interaction.
2.4 Measures of Relationships Between Variables 87 Minitab, Excel, SPSS, SAS, and many other statistical packages can be used to compute descriptive measures such as the sample covariance and the sample cor-relation coefficient. Consider Example 2.19. Figure 2.6 shows the Minitab output for computing covariance and correlation, and Figure 2.7 shows the Excel output for the same data.
Special care must be taken if we use Excel to compute covariance. In Example 2.19 the covariance between Facebook posts and fan interactions was found to be 81.542 (the same value as in the Minitab output in Figure 2.6). But the covariance of 72.4815 given in the Excel output is the population covariance, not the sample covariance. That is, Excel automatically calculates the population covariance as well as the population variance for the X and Y variables. To obtain the sample covariance, we must multiply the population covariance by a factor of n> 1n - 12.
Covariances: Facebook Posts, Fan Interactions
Facebook Posts Fan Interactions Facebook Posts 34.694
Fan Interactions 81.542 11,369.500
Correlations: Facebook Posts, Fan Interactions
Pearson Correlation of Facebook Posts and Fan Interactions = 0.130
Covariance
Facebook Posts
Fan Interactions Facebook Posts 30.8395
Fan Interactions 72.4815 10106.2222
Correlation
Facebook Posts
Fan Interactions Facebook Posts 1
Fan Interactions 0.1298 1
Figure 2.7 Covariance and Correlation:
Facebook Posts, Fan Interactions (Excel) Figure 2.6 Covariance and Correlation:
Facebook Posts, Fan Interactions (Minitab)
Example 2.20 Analysis of Stock Portfolios (Correlation Coefficient Analysis)
Christina Bishop, financial analyst for Integrated Securities, is considering a number of different stocks for a new mutual fund she is developing. One of her questions concerns the correlation coefficients between prices of different stocks. To determine the patterns of stock prices, she prepared a series of scatter plots and computed the sample correlation coefficient for each plot. What information does Figure 2.8 provide?
From the Excel output, the sample covariance between Facebook posts and fan inter-actions is found as follows:
Cov1x, y2 = 72.4815a9
8b = 81.542
More formal procedures to determine if two variables are linearly related are dis-cussed in Chapters 11 and 12. Also, we consider another measure of correlation in Chap-ter 14.
88 Chapter 2 Using Numerical Measures to Describe Data r 5 10.56
$150 $130
$110
$90
$70
$50
$50 $70 $90 $110
Stock Price X
Stock Price Y
$130 $150
r 5 10.93
$150 $130
$110
$90
$70
$50
$50 $70 $90 $110
Stock Price Z
Stock Price Y
$130 $150
r 5 20.28
$150 $130
$110
$90
$70
$50
$50 $70 $90 $110
Stock Price Z
Stock Price D
$130 $150 r 5 10.26
$150 $130
$110
$90
$70
$50
$50 $70 $90 $110
Stock Price E
Stock Price D
$130 $150
r 5 20.91
$150 $130
$110
$90
$70
$50
$50 $70 $90 $110
Stock Price B
Stock Price Y
$130 $150 r 5 20.55
$150 $130
$110
$90
$70
$50
$50 $70 $90 $110
Stock Price A
Stock Price Y
$130 $150
Figure 2.8 Relationships Between Various Stock Prices
Solution Christina sees that it is possible to control the variation in the average mutual fund price by combining various stocks into a portfolio. The portfolio variation is increased if stocks with positive correlation coefficients are included because the prices tend to increase together. In contrast, the portfolio variation is decreased if stocks with negative correlation coefficients are included. When the price of one stock increases, the price of the other decreases, and the combined price is more stable. Experienced observers of stock prices might question the possibility of very large negative correlation coefficients. Our objective here is to illustrate graphically the correlation coefficients for certain patterns of observed data and not to accurately describe a particular market. After examining these correlation coefficients, Christina is ready to begin constructing her portfolio. Correlation coefficients between stock prices affect the variation of the entire portfolio.
E
XERCISESVisit www.mymathlab.com/global or www.pearsonglobal editions.com/newbold to access the data files.
Basic Exercises
2.35 Following is a random sample of seven (x, y) pairs of data points:
11, 52 13, 72 14, 62 15, 82 17, 92 13, 62 15, 72
a. Compute the covariance.
b. Compute the correlation coefficient.
2.36 Following is a random sample of five (x, y) pairs of data points:
112, 2002 130, 6002 115, 2702 124, 5002 114, 2102 a. Compute the covariance.
b. Compute the correlation coefficient.
It is important to understand that correlation does not imply causation. It is possible for two variables to be highly correlated, but that does not mean that one variable causes the other variable. We need to be careful about jumping to conclusions based on television news reports, newspaper articles, online Web sites, or even medical studies that claim that A causes B.
Key Words 89 2.37 Following is a random sample of price per piece of
plywood, X, and quantity sold, Y (in thousands):
Price per Piece (x) Thousands of Pieces Sold (y)
$6 80
7 60
8 70
9 40
10 0
a. Compute the covariance.
b. Compute the correlation coefficient.
Application Exercises
2.38 River Hills Hospital is interested in determining the effectiveness of a new drug for reducing the time required for complete recovery from knee sur-gery. Complete recovery is measured by a series of strength tests that compare the treated knee with the untreated knee. The drug was given in varying amounts to 18 patients over a 6-month period. For each patient the number of drug units, X, and the days for complete recovery, Y, are given by the fol-lowing (x, y) data:
15, 532 121, 652 114, 482 111, 662 19, 462 14, 562 17, 532 121, 572 117, 492 114, 662 19, 542 17, 562 19, 532 121, 522 113, 492 114, 562 19, 592 14, 562 a. Compute the covariance.
b. Compute the correlation coefficient.
c. Briefly discuss the relationship between the number of drug units and the recovery time. What dosage might we recommend based on this initial analysis?
2.39 A Hong Kong snack-food vendor offers 3 types of boxed “lunches to go,” priced at $3, $5, and $10, re-spectively. The vendor would like to establish whether there is a relationship between the price of the boxed lunch and the number of sales achieved per hour.
Consequently, over a 15-day period the vendor re-cords the number of sales made for each of the 3 types of boxed lunches. The following data show the boxed-lunch price (x) and the number sold (y) during each of the 15 lunch hours.
(3 , 7), (5 , 5), (10 , 2), (3 , 9), (5 , 6), (10 , 5), (3 , 6), (5 , 6), (10 , 1), (3 , 10), (5 , 7), (10 , 4), (3 , 5), (5 , 6), (10 , 4)
a. Describe the data numerically with their covari-ance and correlation.
b. Discuss the relationship between the price and num-ber of boxed lunches sold.
2.40 The following data give X, the price charged for a par-ticular item, and Y, the quantity of that item sold (in thousands):
Price per Piece (X) Hundreds of Pieces Sold (Y)
$5 55
6 53
7 45
8 40
9 20
a. Compute the covariance.
b. Compute the correlation coefficient.
2.41 Snappy Lawn Care, a growing business in cen-tral Florida, keeps records of the temperature (in degrees Fahrenheit) and the time (in hours) re-quired to complete a contract. A random sample of temperatures and time for n = 11 contracts is stored in the data file Snappy Lawn Care.
a. Compute the covariance.
b. Compute the correlation coefficient.
2.42 A consumer goods company has been studying the effect of advertising on total profits. As part of this study, data on advertising expenditures (in thou-sands of dollars) and total sales (in thouthou-sands of dol-lars) were collected for a 5-month period and are as follows:
110, 1002 115, 2002 17, 802 112, 1202 114, 1502 The first number is advertising expenditures and the second is total sales. Plot the data and compute the correlation coefficient.
2.43 The president of Floor Coverings Unlimited wants in-formation concerning the relationship between retail experience (years) and weekly sales (in hundreds of dollars). He obtained the following random sample on experience and weekly sales:
12, 52 14, 102 13, 82 16, 182 13, 62 15, 152 16, 202 12, 42 The first number for each observation is years of expe-rience, and the second number is weekly sales. Com-pute the covariance and the correlation coefficient.
K
EYW
ORDS• arithmetic mean, 60
• box-and-whisker plot, 69
• coefficient of variation, CV, 75
• correlation coefficient, 84
• covariance (Cov), 84
• empirical rule, 76
• first quartile, 64
• five-number summary, 65
• geometric mean, 63
• geometric mean rate of return, 63
• interquartile range (IQR), 69
• median, 60
• mode, 60
• percentiles, 64
• Pth percentile, 64
• quartiles, 64
• range, 69
• second quartile, 64
• skewness, 91
• standard deviation, 72
• third quartile, 65
• variance, 71
• weighted mean, 80
• z-score, 77
90 Chapter 2 Using Numerical Measures to Describe Data
D
ATAF
ILES• Completion Times, 79, 83, 90
• Florin, 79
• Gilotti’s Pizzeria, 70, 90
• Grade Point Averages, 62, 68
• HEI Cost Data Variable Subset, 67
• Mendez Mortgage, 91
• Rates, 79
• RELEVANT Magazine, 85
• Shopping Times, 66, 69, 90
• Snappy Lawn Care, 89, 90
• Student GPA, 90
• Study, 68
• Sun, 68
• Water, 79
C
HAPTERE
XERCISES ANDA
PPLICATIONS Visit www.mymathlab.com/global or www.pearsonglobaleditions.com/newbold to access the data files.
2.44 A major airport recently hired consultant John Cadariu to study the problem of air traffic delays. He recorded the number of minutes planes were late for a sample of flights in the following table:
Minutes late
0 6 10 10 6 20 20 6 30 30 6 40 40 6 50 50 6 60
Number of flights
30 25 13 6 5 4
a. Estimate the mean number of minutes late.
b. Estimate the sample variance and standard deviation.
2.45 Snappy Lawn Care, a growing business in cen-tral Florida, keeps records of charges for its pro-fessional lawn care services. A random sample of n = 50 charges is stored in the data file Snappy Lawn Care. Describe the data numerically.
a. Compute the mean charge.
b. Compute the standard deviation.
c. Compute the five-number summary.
2.46 In Example 2.9 we calculated the variance and standard deviation for Location 1 of Gilotti’s Pizzeria restaurants. Use the data in the data file Gil-otti’s Pizzeria to find the variance and the standard deviation for Location 2, Location 3, and Location 4.
2.47 Describe the following data numerically:
14, 532 110, 652 115, 482 110, 662 18, 462 15, 562 17, 602 111, 572 112, 492 114, 702 110, 542 17, 562 19, 502 18, 522 111, 592 110, 662 18, 492 15, 502 2.48 Only 67 students in the data file Student GPA
have SAT verbal scores.
a. Construct the scatter plot of GPAs and SAT scores for these 67 students.
b. Calculate the correlation between GPAs and SAT scores for these 67 students.
2.49 Consider the following four populations:
• 1, 2, 3, 4, 5, 6, 7, 8
• 1, 1, 1, 1, 8, 8, 8, 8
• 1, 1, 4, 4, 5, 5, 8, 8,
• -6, -3, 0, 3, 6, 9, 12, 15
All these populations have the same mean. Without do-ing the calculations, arrange the populations accorddo-ing
to the magnitudes of their variances, from smallest to largest. Then calculate each of the variances manually.
2.50 An auditor finds that the values of a corporation’s ac-counts receivable have a mean of $295 and a standard deviation of $63.
a. It can be guaranteed that 60% of these values will be in what interval?
b. It can be guaranteed that 84% of these values will be in what interval?
2.51 In one year, earnings growth of the 500 largest U.S.
corporations averaged 9.2%; the standard deviation was 3.5%.
a. It can be guaranteed that 84% of these earnings growth figures will be in what interval?
b. Using the empirical rule, it can be estimated that approximately 68% of these earnings growth fig-ures will be in what interval?
2.52 Tires of a particular brand have a lifetime mean of 29,000 miles and a standard deviation of 3,000 miles.
a. It can be guaranteed that 75% of the lifetimes of tires of this brand will be in what interval?
b. Using the empirical rule, it can be estimated that approximately 95% of the lifetimes of tires of this brand will be in what interval?
2.53 The supervisor of a very large plant obtained the time (in seconds) for a random sample of n= 110 employees to complete a particular task. The data is stored in the data file Completion Times.
a. Find and interpret the IQR.
b. Find the five-number summary.
2.54 How much time (in minutes) do people spend on a typical visit to a local mall? A random sample of n = 104 shoppers was timed and the results (in minutes) are stored in the data file Shopping Times. You were asked to describe graphically the shape of the distribution of shopping times in Exer-cise 1.72 (Chapter 1). Now describe the shape of the distribution numerically.
a. Find the mean shopping time.
b. Find the variance and standard deviation in shop-ping times.
c. Find the 95th percentile.
d. Find the five-number summary.
e. Find the coefficient of variation.
Appendix 91 f. Ninety percent of the shoppers completed their
shopping within approximately how many minutes?
2.55 A random sample for five exam scores produced the following (hours of study, grade) data values:
Hours Studied (x) Test Grade (y)
3.5 88
2.4 76
4 92
5 85
1.1 60
a. Compute the covariance.
b. Compute the correlation coefficient
2.56 A corporation administers an aptitude test to all new sales representatives. Management is interested in the extent to which this test is able to predict weekly sales of new representatives. Aptitude test scores range from 0 to 30 with greater scores indicating a higher ap-titude. Weekly sales are recorded in hundreds of dol-lars for a random sample of 10 representatives. Test scores and weekly sales are as follows:
Test Score, x 12 30 15 24 14 18 28 26 19 27 Weekly Sales, y 20 60 27 50 21 30 61 54 32 57 a. Compute the covariance between test score and
weekly sales.
b. Compute the correlation between test score and weekly sales.