Normal Distributions and Standard ( z ) Scores
5.6 FINDING SCORES 5.7 MORE ABOUT z SCORES
Summary / Important Terms / Key Equations / Review Questions
Preview
The familiar bell-shaped normal curve describes many observed frequency distributions, including scores on IQ tests, slight measurement errors made by a succession of people who attempt to measure precisely the same thing, the useful lives of 100-watt electric light bulbs, and even the heights of stalks in a field of corn.
As will become apparent in later chapters, the normal curve also describes some important theoretical distributions in inferential statistics.
Thanks to the standard normal table, we can answer questions about any normal distribution whose mean and standard deviation are known. In the long run, this proves to be both more accurate and more efficient than dealing directly with each observed frequency distribution. Use of the standard normal table requires a familiarity with z scores. Regardless of the original measurements—whether IQ points, measurement errors in millimeters, or reaction times in milliseconds— z scores are “pure” or unit- free numbers that indicate how many standard deviation units an observation is above or below the mean.
5 . 1 T H E N O R M A L C U R V E 8 3
In the classic movie The President’s Analyst, the director of the Federal Bureau of Investigation, rather short himself, encourages the recruitment of similarly short FBI agents. If, in fact, FBI agents are to be selected only from among applicants who are no taller than exactly 66 inches, what proportion of all of the original applicants will be eligible? This question can’t be answered without additional information.
One source of additional information is the relative frequency distribution of heights for the 3091 men shown in Figure 5.1. To find the proportion of men who are a par- ticular height, merely note the value of the vertical scale that corresponds to the top of any bar in the histogram. For example, .10 of these men, that is, one-tenth of 3091, or about 309 men, are 70 inches tall.
When expressed as a proportion, any conclusion based on the 3091 men can be gen- eralized to other comparable sets of men, even sets containing an unspecified number.
For instance, if the distribution in Figure 5.1 is viewed as representative of all men who apply for FBI jobs, we can estimate that .10 of all applicants will be 70 inches tall.
Or, given the director’s preference for shorter agents, we can use the same distribution to estimate the proportion of applicants who will be eligible. To obtain the estimated proportion of eligible applicants (.165) from Figure 5.1, add the values associated with the shaded bars. (Only half of the bar at 66 inches is shaded to adjust for the fact that any height between 65.5 and 66.5 inches is reported as 66 inches, whereas eligible applicants must be shorter than exactly 66 inches, that is, 66.0 inches.)
The distribution in Figure 5.1 has an obvious limitation: It is based on a group of just 3091 men that, at most, only resembles the distributions for other groups of men, including the group of FBI applicants. Therefore, any generalization will contain inac- curacies due to chance irregularities in the original distribution.
5 . 1 T H E N O R M A L C U R V E
More accurate generalizations usually can be obtained from distributions based on larger numbers of men. A distribution based on 30,910 men usually is more accu- rate than one based on 3091, and a distribution based on 3,091,000 usually is even more accurate. But it is prohibitively expensive in both time and money to even survey 30,910 people. Fortunately, it is a fact that the distribution of heights for all
FIGURE 5.1
Relative frequency distribution for heights of 3091 men.
Source: National Center for Health Statistics, 1960–62, Series 11, No.14. Mean updated by authors.
.165
.035
.03 .07
.01 .02
62*
*62 inches or shorter
**76 inches or taller
63 64 65 66 67 68 69 Height (inches)
70 71 72 73 74 75 76**
.15
.10
Proportion
.05
American men—not just 3091 or even 3,091,000—approximates the normal curve, a well- documented theoretical curve.
In Figure 5.2, the idealized normal curve has been superimposed on the original distribution for 3091 men. Irregularities in the original distribution, most likely due to chance, are ignored by the smooth normal curve. Accordingly, any generalizations based on the smooth normal curve will tend to be more accurate than those based on the original distribution.
I n t e r p r e t i n g t h e S h a d e d A r e a
The total area under the normal curve in Figure 5.2 can be identified with all FBI applicants. Viewed relative to the total area, the shaded area represents the proportion of applicants who will be eligible because they are shorter than exactly 66 inches. This new, more accurate proportion will differ from that obtained from the original histo- gram (.165) because of discrepancies between the two distributions.
F i n d i n g a P r o p o r t i o n f o r t h e S h a d e d A r e a
To find this new proportion, we cannot rely on the vertical scale in Figure 5.2, because it describes as proportions the areas in the rectangular bars of histograms, not the areas in the various curved sectors of the normal curve. Instead, in Section 5.3 we will learn how to use a special table to find the proportion represented by any area under the normal curve, including that represented by the shaded area in Figure 5.2.
P r o p e r t i e s o f t h e N o r m a l C u r v e
Let’s note several important properties of the normal curve:
■ Obtained from a mathematical equation, the normal curve is a theoretical curve defined for a continuous variable, as described in Section 1.6, and noted for its symmetrical bell-shaped form, as revealed in Figure 5.2.
■ Because the normal curve is symmetrical, its lower half is the mirror image of its upper half.
■ Being bell shaped, the normal curve peaks above a point midway along the hor- izontal spread and then tapers off gradually in either direction from the peak (without actually touching the horizontal axis, since, in theory, the tails of a nor- mal curve extend infinitely far).
■ The values of the mean, median (or 50th percentile), and mode, located at a point midway along the horizontal spread, are the same for the normal curve.
Normal Curve
A theoretical curve noted for its symmetrical bell-shaped form.
FIGURE 5.2
Normal curve superimposed on the distribution of heights.
.15
.10
Proportion
.05
62 63 64 65 66 67 68 69 Height (inches)
70 71 72 73 74 75 76
5 . 1 T H E N O R M A L C U R V E 8 5
I m p o r t a n c e o f M e a n a n d S t a n d a r d D e v i a t i o n
When you’re using the normal curve, two bits of information are indispensable:
values for the mean and the standard deviation. For example, before the normal curve can be used to answer the question about eligible FBI applicants, it must be established that, for the original distribution of 3091 men, the mean height equals 69 inches and the standard deviation equals 3 inches.
D i f f e r e n t N o r m a l C u r v e s
Having established that a particular normal curve has a mean of 69 inches and a stan- dard deviation of 3 inches, we can’t arbitrarily change these values, as any change in the value of either the mean or the standard deviation (or both) would create a new normal curve that no longer describes the original distribution of heights. Nevertheless, as a theo- retical exercise, it is instructive to note the various types of normal curves that are produced by an arbitrary change in the value of either the mean (μ) or the standard deviation (σ).*
For example, changing the mean height from 69 to 79 inches produces a new nor- mal curve that, as shown in panel A of Figure 5.3, is displaced 10 inches to the right of the original curve. Dramatically new normal curves are produced by changing the value of the standard deviation. As shown in panel B of Figure 5.3, changing the stan- dard deviation from 3 to 1.5 inches produces a more peaked normal curve with smaller variability, whereas changing the standard deviation from 3 to 6 inches produces a shallower normal curve with greater variability.
Obvious differences in appearance among normal curves are less important than you might suspect. Because of their common mathematical origin, every normal curve can be interpreted in exactly the same way once any distance from the mean is expressed in standard deviation units. For example, .68, or 68 percent of the total area under a normal curve—any normal curve—is within one standard deviation above and below the mean, and only .05, or 5 percent, of the total area is more than two standard deviations above and below the mean. And this is only the tip of the iceberg. Once any distance from the mean has been expressed in standard deviation units, we will be able to consult the standard normal table, described in Section 5.3, to determine the corre- sponding proportion of the area under the normal curve.
*Since the normal curve is an idealized curve that is presumed to describe a complete set of observations or a population, the symbols μ and σ, representing the mean and standard deviation of the population, respectively, will be used in this chapter.
FIGURE 5.3
Different normal curves.
μ = 69
μ = 69 μ μ = 73= 73
σ = 3 σ = 1.5
μ = 73 μ = 73 σ = 3
μ = 73 μ = 73 σ = 6 μ = 79
μ = 79 σ = 3
79
A. Different Means, Same Standard Deviation B. Same Mean, Different Standard Deviations 73
69
where X is the original score and μ and σ are the mean and the standard deviation, respectively, for the normal distribution of the original scores. Since identical units of measurement appear in both the numerator and denominator of the ratio for z, the original units of measurement cancel each other and the z score emerges as a unit-free or standardized number, often referred to as a standard score.
A z score consists of two parts:
1. a positive or negative sign indicating whether it’s above or below the mean; and 2. a number indicating the size of its deviation from the mean in standard deviation
units.
A z score of 2.00 always signifies that the original score is exactly two standard devia- tions above its mean. Similarly, a z score of –1.27 signifies that the original score is exactly 1.27 standard deviations below its mean. A z score of 0 signifies that the origi- nal score coincides with the mean.
C o n v e r t i n g t o z S c o r e s
To answer the question about eligible FBI applicants, replace X with 66 (the maxi- mum permissible height), μ with 69 (the mean height), and σ with 3 (the standard deviation of heights) and solve for z as follows:
66 69 3
3 3 1
This informs us that the cutoff height is exactly one standard deviation below the mean.
Knowing the value of z, we can use the table for the standard normal curve to find the proportion of eligible FBI applicants. First, however, we’ll make a few comments about the standard normal curve.
Progress Check *5.1 Express each of the following scores as a z score:
(a) Margaret’s IQ of 135, given a mean of 100 and a standard deviation of 15
(b) a score of 470 on the SAT math test, given a mean of 500 and a standard deviation of 100 (c) a daily production of 2100 loaves of bread by a bakery, given a mean of 2180 and a stan-
dard deviation of 50
5 . 2 z S C O R E S
A z score is a unit-free, standardized score that, regardless of the original units of measurement, indicates how many standard deviations a score is above or below the mean of its distribution.
To obtain a z score, express any original score, whether measured in inches, millisec- onds, dollars, IQ points, etc., as a deviation from its mean (by subtracting its mean) and then split this deviation into standard deviation units (by dividing by its standard deviation), that is,
z Score
A unit-free, standardized score that indicates how many standard devi- ations a score is above or below the mean of its distribution.
z SCORE z X
(5.1)
5 . 3 S TA N D A R D N O R M A L C U R V E 8 7
(d) Sam’s height of 69 inches, given a mean of 69 and a standard deviation of 3
(e) a thermometer-reading error of –3 degrees, given a mean of 0 degrees and a standard deviation of 2 degrees
Answers on page 426.
5 . 3 S TA N D A R D N O R M A L C U R V E
If the original distribution approximates a normal curve, then the shift to standard or z scores will always produce a new distribution that approximates the standard normal curve. This is the one normal curve for which a table is actually available. It is a math- ematical fact—not proven in this book—that the standard normal curve always has a mean of 0 and a standard deviation of 1. However, to verify (rather than prove) that the mean of a standard normal distribution equals 0, replace X in the z score formula with μ, the mean of any (nonstandard) normal distribution, and then solve for z:
Mean of X 0 0
z
Likewise, to verify that the standard deviation of the standard normal distribution equals 1, replace X in the z score formula with μ + 1σ, the value corresponding to one standard deviation above the mean for any (nonstandard) normal distribution, and then solve for z:
1 1
Standard deviation of X 1
z
Although there is an infinite number of different normal curves, each with its own mean and standard deviation, there is only one standard normal curve, with a mean of 0 and a standard deviation of 1.
Figure 5.4 illustrates the emergence of the standard normal curve from three different normal curves: that for the men’s heights, with a mean of 69 inches and a standard deviation of 3 inches; that for the useful lives of 100-watt electric light bulbs, with a mean of 1200 hours and a standard deviation of 120 hours; and that for the IQ scores of fourth graders, with a mean of 105 points and a standard deviation of 15 points.
Standard Normal Curve The tabled normal curve for z
scores, with a mean of 0 and a standard deviation of 1.
FIGURE 5.4
Converting three normal curves to the standard normal curve.
Useful Lives (hours) Heights
(inches)
IQ Scores
X X
z –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 z 1200132014401560
960 840 1080
60 63 66 69 72 75 78 60 75 90 105 120 135 150
Converting all original observations into z scores leaves the normal shape intact but not the units of measurement. Shaded observations of 66 inches, 1080 hours, and 90 IQ points all reappear as a z score of –1.00. Verify this by using the z score formula. Show- ing no traces of the original units of measurement, this z score contains the one cru- cial bit of information common to the three original observations: All are located one standard deviation below the mean. Accordingly, to find the proportion for the shaded areas in Figure 5.4 (that is, the proportion of applicants who are less than exactly 66 inches tall, or light bulbs that burn for fewer than 1080 hours, or fourth graders whose IQ scores are less than 90), we can use the same z score of –1.00 when referring to the table for the standard normal curve, the one table for all normal curves.
S t a n d a r d N o r m a l T a b l e
Essentially, the standard normal table consists of columns of z scores coordinated with columns of proportions. In a typical problem, access to the table is gained through a z score, such as –1.00, and the answer is read as a proportion, such as the proportion of eligible FBI applicants.
U s i n g t h e T o p L e g e n d o f t h e T a b l e
Table 5.1 shows an abbreviated version of the standard normal curve, while Table A in Appendix C on page 458 shows a more complete version of the same curve. Notice that columns are arranged in sets of three, designated as A, B, and C in the legend at the top of the table. When using the top legend, all entries refer to the upper half of the standard normal curve. The entries in column A are z scores, beginning with 0.00 and ending (in the full-length table of Appendix C) with 4.00. Given a z score of zero or more, columns B and C indicate how the z score splits the area in the upper half of the normal curve. As suggested by the shading in the top legend, column B indicates the proportion of area between the mean and the z score, and column C indicates the proportion of area beyond the z score, in the upper tail of the standard normal curve.
U s i n g t h e B o t t o m L e g e n d o f t h e T a b l e
Because of the symmetry of the normal curve, the entries in Table 5.1 and Table A of Appendix C also can refer to the lower half of the normal curve. Now the columns are designated as A′, B′, and C′ in the legend at the bottom of the table. When using the bottom legend, all entries refer to the lower half of the standard normal curve.
Imagine that the nonzero entries in column A′ are negative z scores, beginning with –0.01 and ending (in the full-length table of Appendix C) with –4.00. Given a negative z score, columns B′ and C′ indicate how that z score splits the lower half of the normal curve. As suggested by the shading in the bottom legend of the table, column B′ indicates the proportion of area between the mean and the negative z score, and column C′ indicates the proportion of area beyond the negative z score, in the lower tail of the standard normal curve.
Progress Check *5.2 Using Table A in Appendix C, find the proportion of the total area identified with the following statements:
(a) above a z score of 1.80
(b) between the mean and a z score of –0.43 (c) below a z score of –3.00
(d) between the mean and a z score of 1.65 (e) between z scores of 0 and –1.96
Answers on page 426.
5 . 4 S O LV I N G N O R M A L C U R V E P R O B L E M S 8 9
TABLE 5.1
PROPORTIONS (OF AREAS) UNDER THE STANDARD NORMAL CURVE FOR VALUES OF z (FROM TABLE A OF APPENDIX C)
0.00 .0000 .5000 0.40 .1554 .3446 0.80 .2881 .2119 0.01 .0040 .4960 0.41 .1591 .3409 0.81 .2910 .2090
• • • • • • • • •
• • •
• • •
• • •
• • • • • • • • • 0.99 .3389 .1611 1.00 .3413 .1587
• • • • • • 1.01 .3438 .1562
• • •
• • •
• • • • • • • • • 0.38 .1480 .3520 0.78 .2823 .2711 1.18 .3810 .1190 0.39 .1517 .3483 0.79 .2852 .2148 1.19 .3830 .1170
C B
z
A B C
z A C
B z
A
C B
A –z
C B
A –z C
B A
–z Reminder:
Use of a standard normal table always involves z scores.
5 . 4 S O LV I N G N O R M A L C U R V E P R O B L E M S
Sections 5.5 and 5.6 give examples of two main types of normal curve problems. In the first type of problem, we use a known score (or scores) to find an unknown proportion.
For instance, we use the known score of 66 inches to find the unknown proportion of eligible FBI applicants. In the second type of problem, the procedure is reversed. Now we use a known proportion to find an unknown score (or scores). For instance, if the FBI director had specified that applicants’ heights must not exceed the 25th percentile (the shortest .25) of the population, we would use the known proportion of .25 to find the unknown cutoff height in inches.
S o l v e P r o b l e m s L o g i c a l l y
Do not rush through these examples, memorizing solutions to particular prob- lems or looking for some magic formula. Concentrate on the logic of the solution, using rough graphs of normal curves as an aid to visualizing the solution. Only after thinking through to a solution should you do any calculations and consult the normal tables. Then, with just a little practice, you will view the wide variety of normal curve problems not as a bewildering assortment but as many slight variations on two distinctive types.
K e y F a c t s t o R e m e m b e r
When using the standard normal table, it is important to remember that for any z score, the corresponding proportions in columns B and C (or columns B′ and C′) always sum to .5000. Similarly, the total area under the normal curve always equals 1.0000, the sum of the proportions in the lower and upper halves, that is, .5000 + .5000. Finally, although a z score can be either positive or negative, the proportions of area under the curve are always positive or zero but never negative (because an area cannot be nega- tive). Figure 5.5 summarizes how to interpret the normal curve table in this book.
5 . 5 F I N D I N G P R O P O R T I O N S
E x a m p l e : F i n d i n g P r o p o r t i o n s f o r O n e S c o r e
Now we’ll use a step-by-step procedure, adopted throughout this chapter, to find the proportion of all FBI applicants who are shorter than exactly 66 inches, given that the distribution of heights approximates a normal curve with a mean of 69 inches and a standard deviation of 3 inches.
1. Sketch a normal curve and shade in the target area, as in the left part of Figure 5.6. Being less than the mean of 69, 66 is located to the left of the mean.
Furthermore, since the unknown proportion represents those applicants who are shorter than 66 inches, the shaded target sector is located to the left of 66.
2. Plan your solution according to the normal table. Decide precisely how you will find the value of the target area. In the present case, the answer will be obtained from column C′ of the standard normal table, since the target area coin- cides with the type of area identified with column C′, that is, the area in the lower tail beyond a negative z.