THE REGRESSION FALLACY - TESTS OF SIGNIFICANCE

PART VIII. TESTS OF SIGNIFICANCE

4. THE REGRESSION FALLACY

THE REGRESSION FALLACY 169

5. For the men age 18 and over in the HANES5 sample, the correlation between height and weight was 0.41; the SD of height was about 3 inches and the SD of weight was about 42 pounds. The men age 55–64 averaged about half an inch shorter than the men age 18–24. True or false, and explain: since half an inch is 1/6 ≈ 0.17 SDs of height, the men age 55–64 must have averaged about 0.41×0.17×42≈3 pounds lighter than the men age 18–24.

The answers to these exercises are on p. A62.

Technical note. The method discussed in example 2 is for median ranks. To see why, assume normality andr=0.4. Of students at the 90th percentile on the SAT (relative to their classmates), about half will rank above the 69th percentile on first-year GPA, and half will rank below. The procedure for estimating average ranks is harder.

We are now going to see why the regression effect appears whenever there is spread around the SD line. This effect was first noticed by Galton in his study of family resemblances, so that is the context for the discussion. But the reasoning is general. Figure 5 shows a scatter diagram for the heights of 1,078 pairs of fathers and sons, as discussed in chapter 8. The summary statistics are⁵

average height of fathers≈68 inches, SD≈2.7 inches

average height of sons≈69 inches, SD≈2.7 inches, r ≈0.5 The sons average 1 inch taller than the fathers. On this basis, it is natural to guess that a 72-inch father should have a 73-inch son; similarly, a 64-inch father should have a 65-inch son; and so on. Such fathers and sons are plotted along the dashed line in figure 5. Of course, not many families are going to be right on the line. In fact, there is a lot of spread around the line. Some of the sons are taller than their fathers; others are shorter.

Take the fathers who are 72 inches tall, to the nearest inch. The corresponding families are plotted in the vertical strip over 72 inches in figure 5, and there is quite a range in the sons’ heights. Some of the points are above the dashed line: the son is taller than 73 inches. But most of the points are below the dashed line: the son is shorter than 73 inches. All in all, the sons of the 72-inch fathers only average 71 inches in height. With tall fathers (high score on first test), on the average the sons are shorter (score on second test drops).

Now look at the points in the vertical strip over 64 inches, representing the families where the father is 64 inches tall, to the nearest inch. The height of the dashed line there is 65 inches, representing a son who is 1 inch taller than his 64-inch father. Some of the points fall below the dashed line, but most are above, and the sons of the 64-inch fathers average 67 inches in height. With short fathers (low score on first test), on the average the sons are taller (score on second test goes up). The aristocratic Galton termed this “regression to mediocrity.”

The dashed line in figure 5 goes through the point corresponding to an average father of height 68 inches, and his average son of height 69 inches. Along the dashed line, each one-SD increase in father’s height is matched by a one-SD increase in son’s height. These two facts make it the SD line. The cloud is symmetric around the SD line, but the strip at 72 inches is not. The strip only contains points with unusually big x-coordinates. And most of the points in this strip fall below the SD line. Conversely, the strip at 64 inches only contains points with unusually smallx-coordinates. Most of the points in this strip fall above the SD line. The hidden imbalance is always there in football-shaped clouds. The graph- ical explanation for the regression effect may not seem very romantic. But then, statistics isn’t known as a romantic subject.

Figure 5 also shows the regression line for the son’s height on father’s height.

This solid line rises less steeply than the dashed SD line, and it picks off the center of each vertical strip of dots—the average y-value in the strip. For instance, take the fathers who are 72 inches tall. They are 4 inches above average in height:

THE REGRESSION FALLACY 171

Figure 5. The regression effect. If a son is 1 inch taller than his father, the family is plotted along the dashed line. The points in the strip over 72 inches correspond to the families where the father is 72 inches tall, to the nearest inch; most of these points are below the dashed line. The points in the strip over 64 inches correspond to families where the father is 64 inches tall, to the nearest inch; most of these points are above the dashed line. The solid regression line picks off the centers of all the vertical strips, and is flatter than the dashed line.

58 60 62 64 66 68 70 72 74 76 78 80

FATHER’S HEIGHT (INCHES) 58

60 62 64 66 68 70 72 74 76 78 80

SON’S HEIGHT (INCHES)

4 inches/2.7 inches ≈ 1.5 SDs. The regression line says their sons should be taller than average, by about

r×1.5 SDs=0.75 SDs≈2 inches.

The overall average height for sons is 69 inches, so the regression estimate for the average height of these sons is 71 inches—dead on.

Figure 6 shows the regression effect at its starkest, without the cloud. The dashed SD line rises at a 45 degree angle. The dots show the average height of the sons corresponding to each value of father’s height. These dots are the centers of the vertical strips in figure 5. The dots rise less steeply than the SD line—the regression effect. On the whole, the dots are halfway between the SD line and the horizontal line through the point of averages. That is because the correlation coefficient is one half. Each one-SD increase in father’s height is accompanied by a half-SD increase in son’s height, not a one-SD increase. The solid regression line goes up at the half-to-one rate, and tracks the graph of averages quite well indeed.

Figure 6. The regression effect. The SD line is dashed, the regression line is solid. The dots show the average height of the sons, for each value of father’s height. They rise less steeply than the SD line. This is the regression effect. The regression line follows the dots.

58 60 62 64 66 68 70 72 74 76 78 FATHER’S HEIGHT (INCHES)

58 60 62 64 66 68 70 72 74 76 78

AVERAGE HEIGHT OF SON (INCHES)

At first glance, the scatter diagram in figure 5 is rather chaotic. It was a stroke of genius on Galton’s part to see a straight line in the chaos. Since Galton’s time, many other investigators have found that the averages in their scatter diagrams followed straight lines too. That is why the regression line is so useful.

Now, a look behind the scenes: the regression effect can be understood a little better in some cases, for instance, in the context of a repeated IQ test. The basic fact is that the two scores are apt to be different. The difference can be explained in terms of chance variability. Each person may be lucky or unlucky on the first test. But if the score on the first test is very high, that suggests the person was

THE REGRESSION FALLACY 173

lucky on that occasion, implying that the score on the second test will probably be lower. (You wouldn’t say, “He scored very high, must have had bad luck that day.”) On the other hand, if the score on the first test was very low, the person was probably unlucky to some extent on that occasion and will do better next time.

Here is a crude model for the test-retest situation, which brings the explanation into sharper focus. The basic equation is

observed test score=true score+chance error.

Assume that the distribution of true scores in the population follows the normal curve, with an average of 100 and an SD of 15. Suppose too that the chance error is as likely to be positive as negative, and tends to be about 5 points in size.

Someone who has a true score of 135 is just as likely to score 130 as 140 on the test. Someone with a true score of 145 is just as likely to score 140 as 150. Of course, the chance error could also be±4, or±6, and so forth: any symmetric pair of values can be dealt with in a similar way.

Figure 7. A model for the regression effect.

Take the people who scored 140 on the first test. There are two alternative explanations for this observed score:

• true score below 140, with a positive chance error;

• true score above 140, with a negative chance error.

The first explanation is more likely. For instance, more people have true scores of 135 than 145, as figure 7 shows.

The model accounts for the regression effect. If someone scores above average on the first test, the true score is probably a bit lower than the observed score.

If this person takes the test again, we predict that the second score will be a bit lower than the first score. On the other hand, if a person scores below average on the first test, we estimate that the true score is a bit higher than the observed score, and our prediction for the second score will be a bit higher than the first score.

Exercise Set D

1. As part of their training, air force pilots make two practice landings with instructors, and are rated on performance. The instructors discuss the ratings with the pilots after each landing. Statistical analysis shows that pilots who make poor landings the first time tend to do better the second time. Conversely, pilots who make good landings the first time tend to do worse the second time. The conclusion: crit- icism helps the pilots while praise makes them do worse. As a result, instructors were ordered to criticize all landings, good or bad. Was this warranted by the facts?

Answer yes or no, and explain briefly.⁶

2. An instructor standardizes her midterm and final each semester so the class average is 50 and the SD is 10 on both tests. The correlation between the tests is around 0.50. One semester, she took all the students who scored below 30 at the midterm, and gave them special tutoring. They all scored above 50 on the final. Can this be explained by the regression effect? Answer yes or no, and explain briefly.

3. In the data set of figures 5 and 6, are the sons of the 61-inch fathers taller on the average than the sons of the 62-inch fathers, or shorter? What is the explanation?

The answers to these exercises are on pp. A62–63.

Dalam dokumen statistics-fourth-edition-0393929728-9780393929720 compress (1) (Halaman 187-192)