• Tidak ada hasil yang ditemukan

SuMMAry oF NotAtioN

ChAPter ProbleMS

Practicing the Basics

3.61 Choose explanatory and response variables For the fol- lowing pairs of variables, identify the response variable and the explanatory variable.

a. Number of weeks of gestation and weight of an infant at birth.

b. Preferred smartphone operating system (iOS, Android, Windows Mobile, etc) and gender.

c. Average number of airline trips taken in the past 12 months and annual income.

d. Weekly grocery budget and marital status.

3.62 Graphing data For each case in the previous exercise, a. Indicate whether each variable is quantitative or

categorical.

b. Describe the type of graph that could best be used to portray the results.

3.63 Life after death for males and females In a recent General Social Survey, respondents answered the question,

“Do you believe in a life after death?” The table shows the responses cross-tabulated with gender.

3.64 God and happiness Go to the GSS website sda.berkeley .edu/GSS; click GSS, with No Weight as the default weight selection; type GOD for the row variable, HAPPY for the column variable; and YEAR(2014) for the Selection Filter, then click Run the Table.

a. Report the contingency table of counts.

b. Treating reported happiness as the response variable, find the conditional proportions. For which opinion about God are subjects most likely to be very happy?

c. To analyze the association, is it more informative to view the proportions in part b or the frequencies in part a? Why?

3.65 Jobs and income A report on the best jobs in 2015 shows salaries related to jobs in the following table:

Opinion About Life After Death by Gender Opinion About Life After Death

Gender Yes No

Male 621 187

Female 834 145

a. Construct a table of conditional proportions.

b. Summarize results. Is there much difference between responses of males and females? Compute the differ- ence and ratio of proportions and interpret each.

Jobs Annual Income

Actuary $ 94,209

Audiologist $ 71,133

Mathematician $102,182

Statistician $ 79,191

Biomedical Engineer $ 89,165 Data Scientist $124,149 Source: www.careercast.com/jobs-rated/

best-jobs-2015.

a. Identify the response variable. Is it quantitative or categorical?

b. Identify the explanatory variable. Is it quantitative or categorical?

c. Explain how a bar graph could summarize the data.

3.66 Bacteria in ground turkey Consumer Reports magazine (June 2013) reported purchasing samples of ground turkey from different brands to test for the presence of bacteria.

172 Chapter 3 Association: Contingency, Correlation, and Regression

a. How would you interpret this correlation?

b. What would you expect this correlation to equal if the quality rating did not tend to depend on the easiness rating of the professor?

3.69 Women in government and economic life The OECD (Organization for Economic Cooperation and

Development) consists of advanced, industrialized coun- tries that accept the principles of representative democ- racy and a free market economy. For the nations outside of Europe that are in the OECD, the table shows UN data from 2007 on the percentage of seats in parliament held by women and female economic activity as a percentage of the male rate.

a. Treating women in parliament as the response vari- able, prepare a scatterplot and find the correlation.

Explain how the correlation relates to the trend shown in the scatterplot.

b. Use software or a calculator to find the regres- sion equation. Explain why the y-intercept is not meaningful.

c. Find the predicted value and residual for the United States. Interpret the residual.

d. With UN data for all 23 OECD nations, the correla- tion between these variables is 0.56. For women in parliament, the mean is 26.5% and the standard devi- ation is 9.8%. For female economic activity, the mean is 76.8 and the standard deviation is 7.7. Find the pre- diction equation, treating women in parliament as the response variable.

The table below shows the number of samples that tested positive for Enterococcus bacteria for packages that claimed no use of antibiotics in the processing of the meat and for packages in which no such claim was made.

a. Find the difference in the proportion of packages that tested positive and interpret.

b. Find the ratio of the proportion of packages that tested positive and interpret.

3.67 Women managers in the work force The following side- by-side bar graph appeared in a 2003 issue of the Monthly Labor Review about women as managers in the work force. The graph summarized the percentage of managers in different occupations who were women, for the years 1972 and 2002.

0 20 40 60 80 100

Women as a Percent of Total Employment in Major Occupations

Percent

Executive, administrative and managerial

19.7

45.9 44.054.7 40.1

50.1

75.0 78.5

61.1 59.9

19722002

16.9 16.1

Professional

specialty Technical

and sales Administrative support, including

clerical

Service Precision production, and operators,

fabricators, and laborers Source: Monthly Labor Review, vol. 126, no. 10, 2003, p. 48. Bureau of Labor Statistics.

a. Consider the first two bars in this graph. Identify the response variable and explanatory variable.

b. Express the information from the first two bars in the form of conditional proportions in a contingency table for two categorical variables.

c. Based on part b, does it seem as if there’s an associa- tion between these variables? Explain.

d. The entire graph shows two explanatory variables.

What are they?

3.68 RateMyProfessor.com The website RateMyProfessors.

com13 reported a correlation of 0.62 between the qual- ity rating of the professor (on a simple 1 to 5 scale with higher values representing higher quality) and the rating of how easy a grader the professor is. This correlation is based on ratings of nearly 7000 professors.

13See insidehighered.com/news/2006/05/08/rateprof.

Nation Women in

Parliament (%) Female Economic Activity

Iceland 33.3 87

Australia 28.3 79

Canada 24.3 83

Japan 10.7 65

United States 15.0 81

New Zealand 32.2 81

3.70 African droughts and dust Is there a relationship between the amount of dust carried over large areas of the Atlantic and the Caribbean and the amount of rainfall in African regions? In an article (by J. M. Prospero and P. J. Lamb, Science, vol. 302, 2003, pp. 1024–1027) the following scatterplots were given along with correspond- ing regression equations and correlations. The precipita- tion index is a measure of rainfall.

Dust (µg m–3)

Precipitation Index

(a) (b) (c)

35

0 5 10 15 20 25 30

–1.6–1.2–0.8–0.4 0.0 0.40.8 35

0 5 10 15 20 25 30

–1.6–1.2–0.8–0.4 0.0 0.40.8 35

0 5 10 15 20 25 30

–1.6–1.2–0.8–0.4 0.00.40.8

Source: J. M. Prospero and P. J. Lamb, Science, vol. 302, 2003, pp. 1024–1027.

Bacteria in Ground Turkey

Test result for Enterococcus

Package Claim Positive Negative

No claim 26 20

No antibiotics 23 5

Chapter Problems 173

a. Sketch this equation between x = 0 and 4, labeling the x- and y-axes. Is this equation realistic? Why or why not?

b. Suppose that actually yn = 0.5 + 0.7x. Predict the GPA for two students having GPAs of 3.0 and 4.0.

Interpret and explain how the difference between these two predictions relates to the slope.

3.75 College GPA = high school GPA Refer to the previ- ous exercise. Suppose the regression equation is yn = x.

Identify the y-intercept and slope. Interpret the line in context.

3.76 Salary and employee satisfaction In 2015, a study was con- ducted among a sample of 221,000 users of the website www .glassdoor.com. It explored the link between salary (in dol- lars) and employee satisfaction (measured on a rating scale of 0 to 100). According to their model, if an employee mak- ing $40,000 per year were given a raise to $44,000 per year, his employee satisfaction rating would increase by 1 point.

a. Assuming a straight-line regression of y= employee satisfaction on x = annual salary, what is the slope?

Interpret it.

b. If x measures salary per month (rather than per year), then what is the slope? Interpret the slope.

3.77 Car weight and gas hogs: The table shows a short excerpt from the Car Weight and Mileage data file on the book’s website. That file lists several 2004 model cars with automatic transmission and their x = weight (in pounds) and y = mileage (miles per gallon of gas).

The prediction equation is yn = 47.32 - 0.0052x.

a. Match the following regression equations and correlations with the appropriate graph.

(i) yn = 14.05 - 7.18x; r = -0.75 (ii) yn = 16.00 - 2.36x; r = -0.44 (iii) yn = 12.80 - 9.77x; r = -0.87

b. Based on the scatterplots and information in part a, what would you conclude about the relationship between dust amount and rainfall amounts?

3.71 Crime rate and urbanization For the data in Example 14 on crime in Florida, the regression line between y = crime rate (number of crimes per 1000 peo- ple) and x = percentage living in an urban environment is yn = 24.5 + 0.56x.

a. Using the slope, find the difference in predicted crime rates between counties that are 100% urban and counties that are 0% urban. Interpret.

b. Interpret the correlation of 0.67 between these variables.

c. Show the connection between the correlation and the slope, using the standard deviations of 28.3 for crime rate and 34.0 for percentage urban.

3.72 Gestational period and life expectancy revisited The data in the Animals file on the book’s website holds observations on the average longevity (in years) and gestational period (in days) for a variety of animals. Exercise 3.52 showed the scatterplot together with the regression equation with inter- cept 6.29 and slope 0.045 and an r2 value of 0.73.

a. Interpret the slope.

b. A leopard has a gestational period of about 98 days.

What is its predicted average longevity?

c. Interpret the value of r2.

d. Show that extrapolating from animals to humans (with gestational period of about 40 weeks) grossly underestimates average human longevity.

3.73 Gas consumption and temperature A study was performed in Duke University to predict the monthly natural gas con- sumption in North Carolina as a function of the average monthly temperature. Residential gas consumptions were recorded from January 2009 to June 2015 and temperature measures were obtained as simple averages of those re- corded at the airports of the 3 largest cities. The result shows that “an increase of one degree Celsius is worth about 641.79 million cubic feet in terms of monthly gas consumption re- duction”. Thus the gas consumption in a month where the average temperature is 25° Celsius will be 3209 MMcf less than a month where the average temperature is 20° Celsius.

a. For the interpretation in quotes, identify the response variable and explanatory variable.

b. State the slope of the regression equation, when av- erage monthly temperature is measured in degrees Celsius and residential gas consumption in MMcf.

c. Explain how the value 3209 relates to the slope.

3.74 Predicting college GPA An admissions officer claims that at his college the regression equation yn = 0.5 + 7x approximates the relationship between y = college GPA and x = high school GPA, both measured on a four- point scale.

Automobile Brand Weight Mileage

Honda Accord Sedan LX 3,164 34

Toyota Corolla 2,590 38

Dodge Dakota Club Cab 3,838 22

Jeep Grand Cherokee Laredo 3,970 21

Hummer H2 6,400 17

Sources: auto.consumerguide.com, honda.com, toyota.com, landrover.com, ford.com.

a. Interpret the slope in terms of a 1000 pound increase in the vehicle weight.

b. Find the predicted mileage and residual for a Hummer H2. Interpret.

3.78 Predicting Internet use from cell phone use We now use data from the Human Development data file on cell phone use and Internet use for 39 countries.

a. The MINITAB output below shows a scatterplot.

Describe it in terms of (i) identifying the response variable and the explanatory variable, (ii) indicating whether it shows a positive or a negative association, and (iii) describing the variability of Internet use val- ues for nations that have cellular use below 30% and for those that have cellular use above 30%.

b. Identify the approximate x- and y-coordinates for a nation that has less Internet use than you would ex- pect, given its level of cell phone use.

c. The prediction equation is yn = 1.27 + 0.475x.

Describe the relationship by noting how yn changes as x increases from 0 to 90, which are roughly its mini- mum and maximum.

174 Chapter 3 Association: Contingency, Correlation, and Regression

40

Murder Rate

30 20 10 0

10 15 20 25 30

Percentage of Single-Parent Families

35 40 45

Scatterplot of Murder Rate by % Single-Parent Families

percentage of the male labor force) yielded the equation yn = 36.3- 0.30x and a correlation of -0.55.

a. Describe the effect by comparing the predicted birth rate for countries with x = 0 and countries with x = 100.

b. Suppose that the correlation between the crude birth rate and the nation’s GNP equals -0.35. Which vari- able, GNP or women’s economic activity, seems to have the stronger association with birth rate? Explain.

3.82 Education and income The regression equation for a sample of 100 people relating x = years of education and y = annual income (in dollars) is yn = -20,000 + 4000x, and the correlation equals 0.50. The standard deviations were 2.0 for education and 16,000 for annual income.

a. Show how to find the slope in the regression equation from the correlation.

b. Suppose that now we let x= annual income and y= years of education. Will the correlation or the slope change in value? If so, show how.

3.83 Income in euros Refer to the previous exercise. Results in the regression equation yn = -20,000 + 4000x for y = annual income were translated to units of euros at a time when the exchange rate was $1.25 per euro.

a. Find the intercept of the regression equation.

(Hint: What does 20,000 dollars equal in euros?) b. Find the slope of the regression equation.

c. What is the correlation when annual income is measured in euros? Why?

3.84 Changing units for cereal data Refer to the Cereal data file on the book’s website, with x = sugar 1g2 and y = sodium 1mg2, for which yn = 169 - 0.25x.

a. Convert the sugar measurements to mg and calculate the line obtained from regressing sodium (mg) on sugar (mg). Which statistics change and which remain the same? Clearly interpret the slope coefficient.

b. Suppose we instead convert the sugar measurements to ounces. How would this effect the slope of the regres- sion line? Can you determine the new slope just from knowing that 1 ounce equals roughly 28.35 grams?

3.85 Murder and single-parent families For Table 3.6 on the 50 states and D.C., the figure below shows the rela- tionship between the murder rate and the percentage of single-parent families.

d. For the United States, x= 45.1 and y = 50.15. Find its predicted Internet use and residual. Interpret the large positive residual.

Scatterplot of INTERNET vs CELLULAR

CELLULAR

INTERNET

0 0 10 20 30 40 50

10 20 30 40 50 60 70 80 90

3.79 Income depends on education? For a study of counties in Florida, the table shows part of a printout for the re- gression analysis relating y = median income (thousands of dollars) to x = percent of residents with at least a high school education.

a. County A has 10% more of its residents than County B with at least a high school education. Find their difference in predicted median incomes.

b. Find the correlation. (Hint: Use the relation between the correlation and the slope of the regression line.) Interpret the (i) sign and (ii) strength of association.

Variable Mean Std Dev

Income 24.51 4.69

Education 69.49 8.86

The regression equation is income = -4.63 +0.42 education

3.80 Fertility and GDP Refer to the Human Development data file on the book’s website. Use x = GDP and y = fertility (mean number of children per adult woman).

a. Construct a scatterplot and indicate whether regression seems appropriate.

b. Find the correlation and the regression equation.

c. With x= percent using contraception,

yn = 6.7 - 0.065x. Can you compare the slope of this regression equation with the slope of the equation with GDP as a predictor to determine which has the stronger association with y? Explain.

d. Contraception has a correlation of -0.887 with fertility. Which variable has a stronger association with fertility: GDP or contraception?

3.81 Women working and birth rate Using data from several nations, a regression analysis of y = crude birth rate (number of births per 1000 population size) on women’s economic activity (female labor force as a

Chapter Problems 175

changed over time. Using the High Jump data file on the book’s website, we get (see also Figure 3.16)

Women_Meters= -10.94 + 0.0065 (Year_Women) for predicting the women’s winning height (in meters) using the year number.

a. Predict the winning Olympic high jump distance for women in (i) 2016 and (ii) 3000.

b. Do you feel comfortable making either prediction in part a? Explain.

3.91 IQ and shoe size A survey of elementary school stu- dents revealed a positive correlation between the shoe size of the subjects and their GPA in the previous year.

a. Explain how age could be a potential lurking variable that could be responsible for this association.

b. If age had actually been one of the variables mea- sured in the study, would it be a lurking variable or a confounding variable? Explain.

3.92 More TV watching goes with fewer babies? For United Nations data from several countries, there is a strong negative correlation between the birth rate and the per capita television ownership.

a. Does this imply that having higher television

ownership causes a country to have a lower birth rate?

b. Identify a lurking variable that could provide an explanation for this association.

3.93 More coffee protects from cancer? In February 2015, a new study published in the Journal of the National Cancer Institute shows people who regularly drink coffee have a lower chance of developing skin cancer. Based on this study, individuals drinking four or more cups of coffee a day had 20 percent lower exposure rate to develop skin cancer.

a. Explain how avoiding the sun could be strongly as- sociated both with total cups of coffee drunk in a day and with a decrease in the exposure to skin cancer and hence, could be a lurking variable responsible for the observed association between coffee and skin cancer exposure.

b. Explain how avoiding the sun could be a common cause of both variables.

3.94 Ask Marilyn Marilyn vos Savant writes a column for Parade magazine to which readers send questions, of- ten puzzlers or questions with a twist. In the April 28, 1996, column, a reader asked, “A company decided to expand, so it opened a factory generating 455 jobs.

For the 70 white-collar positions, 200 males and 200 females applied. Of the people who applied, 20% of the females and only 15% of the males were hired. Of the 400 males applying for the blue-collar positions, 75% were hired. Of the 100 females applying, 85% of were hired. A federal Equal Employment Opportunity Commission (EEOC) enforcement official noted that many more males were hired than females and decided to investigate. Responding to charges of irregularities in hiring, the company president denied any discrimi- nation, pointing out that in both the white-collar and blue-collar fields, the percentage of female applicants hired was greater than it was for males. But the govern- ment official produced his own statistics, which showed a. For D.C., the percentage of single-parent

families = 44.7 and the murder rate = 41.8. Identify D.C. on the scatterplot and explain the effect you would expect it to have on a regression analysis.

b. The regression line fitted to all 51 observations is yn = -21.4 + 1.14x. The regression line fitted only to the 50 states is yn = -8.2 + 0.56x. Summarize the effect of including D.C. in the analysis.

3.86 Violent crime and college education For the U.S.

Statewide Crime data file on the book’s website, let y = violent crime rate and x = percent with a college education.

a. Construct a scatterplot. Identify any points that you think may be influential in a regression analysis.

b. Fit the regression line, using all 51 observations.

Interpret the slope.

c. Fit the regression line after deleting the observation identified in part a. Interpret the slope and compare results to part b.

3.87 Violent crime and high school education Repeat the previous exercise using x= percent with at least a high school education. This shows that an outlier is not es- pecially influential if its x-value is not relatively large or small.

3.88 Crime and urbanization For the U.S. Statewide Crime data file on the book’s website, using MINITAB to analyze y = violent crime rate and x = urbanization (percentage of the residents living in metropolitan areas) gives the results shown:

Variable N Mean StDev Minimum Q1 Median Q3 Maximum violent 51 441.6 241.4 81.0 281.0 384.0 554.0 1508.0 urban 51 68.36 20.85 27.90 49.00 70.30 84.50 100.00 The regression equation is violent =36.0 +5.93 urban

a. Using the five-number summary of positions, sketch a box plot for y. What does your graph and the reported mean and standard deviation of y tell you about the shape of the distribution of violent crime rate?

b. Construct a scatterplot. Does it show any potentially influential observations? Would you predict that the slope would be larger, or smaller, if you delete that observation? Why?

c. Fit the regression without the observation highlighted in part b. Describe the effect on the slope.

3.89 High school graduation rates and health insurance Access the HS Graduation Rates file on the book’s website, which contains statewide data on x = high school graduation rate and y = percentage of individuals without health insurance.

a. Construct a scatterplot. Describe the relationship.

b. Find the correlation. Interpret.

c. Find the regression equation for the data. Interpret the slope, and summarize the relationship.

3.90 Women’s Olympic high jumps Example 11 discussed how the winning height in the Olympic high jump