• Tidak ada hasil yang ditemukan

PART VIII. TESTS OF SIGNIFICANCE

7. SUMMARY

SUMMARY 157

(b) There were 530 couples in the sample, and there is a dot for each couple. But if you count, there are only 104 dots in the scatter dia- gram. How can that be? Explain briefly.

(c) Three areas are shaded. Match the area with the description. (One de- scription will be left over.)

(i) Wife completed 16 years of schooling.

(ii) Wife completed more years of schooling than husband.

(iii) Husband completed more than 16 years of schooling.

(iv) Husband completed 12 years of schooling and wife completed fewer years of schooling than husband.

10

Regression

You’ve got to draw the line somewhere.

1. INTRODUCTION

The regression method describes how one variable depends on another. For example, take height and weight. We have data for 471 men age 18–24 (from the Health and Nutrition Examination Survey—HANES5; see p. 58). In round numbers the average height of these men was 70 inches, and their overall average weight was 180 pounds. Naturally, the taller men weighed more. How much of an increase in weight is associated with a unit increase in height? To get started, look at the scatter diagram (figure 1 on the next page). Height is plotted on the horizontal axis, and weight on the vertical. The summary statistics are1

average height≈70 inches, SD≈3 inches

average weight≈180 pounds, SD≈45 pounds, r ≈0.40

The scales on the vertical and horizontal axes have been chosen so that one SD of height and one SD of weight cover the same distance on the page. This makes the SD line (dashed) rise at 45 degrees across the page. There is a fair amount of scatter around the line:ris only 0.40.

The vertical strip in figure 1 shows the men who were one SD above aver- age in height (to the nearest inch). The men who were also one SD above average in weight would be plotted on the SD line. However, most of the points in the strip are well below the SD line. In other words, most of the men who were one SD above average in height were quite a bit less than one SD above average in

INTRODUCTION 159

Figure 1. Scatter diagram. Each point shows the height and weight for one of the 471 men age 18–24 in HANES5. The vertical strip represents men who are about one SD above average in height. Those who are also one SD above average in weight would be plotted along the dashed SD line.

Most of the men in the strip are below the SD line: they are only part of an SD above average in weight. The solid regression line estimates average weight at each height.

weight. The average weight of these men is only part of an SD above the overall average weight. This is where the correlation of 0.40 comes in. Associated with an increase of one SD in height there is an increase of only 0.40 SDs in weight, on the average.

To be more specific, take the men who are one SD above average in height:

average height+SD of height=70 in+3 in=73 in.

Their average weight will be above the overall average by 0.40 SDs of weight.

Translated back to pounds, that’s

0.40×45 lb=18 lb.

So, the average weight of these men is around 180 lb+18 lb=198 lb.

The point (73 inches, 198 pounds) is marked by a cross in figure 1.

What about the men who are 2 SDs above average in height? Now average height+2 SD of height=70 in + 2×3 in=76 in.

The average weight of this second group of men should be above the overall aver- age by 0.40×2=0.80 SDs of weight. That’s 0.80×45 lb=36 lb. So their average is around 180 lb+36 lb=216 lb. The point (76 inches, 216 pounds) is also marked by a cross in figure 1.

What about the men who are 2 SDs below average in height? Their height equals

average height−2 SD of height=70 in − 2×3 in=64 in.

Their average weight is below the overall average by 0.40×2 = 0.80 SDs of weight. That’s 0.80×45 lb = 36 lb. The average weight of this third group is around 180 lb−36 lb=144 lb. The point (64 inches, 144 pounds) is marked by a third cross in figure 1.

All the points (height, estimate for average weight) fall on the solid line shown in figure 1. This is theregression line. The line goes through the point of averages: men of average height should also be of average weight.

The regression line for yon x estimates the average value fory corresponding to each value ofx.

Along the regression line, associated with each increase of one SD in height there is an increase of only 0.40 SDs in weight. To be more specific, imagine grouping the men by height. There is a group which is average in height, another group which is one SD above average in height, and so on. From each group to the next, the average weight also goes up, but only by around 0.40 SDs. Remember where the 0.40 comes from. It is the correlation between height and weight.

This way of using the correlation coefficient to estimate the average value of yfor each value ofxis called theregression method. The method can be stated as follows.

Associated with each increase of one SD inxthere is an increase of onlyrSDs iny, on the average.

Two different SDs are involved here: the SD ofx, to gauge changes inx; and the SD ofy, to gauge changes in y. It is easy to get carried away by the rhythm: ifx goes up by one SD, so does y. But that’s wrong. On the average, yonly goes up byrSDs (figure 2, next page).

Why isrthe right factor? Three cases are easy to see directly. First, suppose ris 0. Then there is no association betweenxandy. So a one-SD increase inxis accompanied by a zero-SD increase iny, on the average. Second, supposer is 1.

Then all the points lie on the SD line: a one-SD increase inxis accompanied by a one-SD increase iny. Third, supposeris−1. The argument is the same, except

INTRODUCTION 161

Figure 2. Regression method. Whenxgoes up by one SD, the average value ofyonly goes up byrSDs.

the line slopes down. With in-between values ofr, a complicated mathematical argument is needed—butris the factor to use.

Exercise Set A

1. In a certain class, midterm scores average out to 60 with an SD of 15, as do scores on the final. The correlation between midterm scores and final scores is about 0.50.

Estimate the average final score for the students whose midterm scores were (a) 75 (b) 30 (c) 60

Plot your regression estimates, as in figure 1.

2. For the men age 18 and over in HANES5,

average height≈69 inches, SD≈3 inches

average weight≈190 pounds, SD≈42 pounds, r≈0.41 Estimate the average weight of the men whose heights were

(a) 69 inches (b) 66 inches (c) 24 inches (d) 0 inches Comment on your answers to (c) and (d).

3. The men age 45–74 in HANES5 had an average height of 69 inches, equal to the overall average height (exercise 2). True or false, and explain: their average weight should be around 190 pounds, that being the overall average weight.

4. For women age 25–34 in the U.S. in 2005, with full-time jobs, the relationship between education (years of schooling completed) and personal income can be summarized as follows:2

average education≈14 years, SD≈2.4 years

average income≈$32,000, SD≈$26,000, r≈0.34

Estimate the average income of those women who have finished high school but have not gone on to college (so they have 12 years of education).

5. Supposer = −1. Can you explain why a one-SD increase inx is matched by a one-SD decrease iny?

The answers to these exercises are on pp. A59–60.