• Tidak ada hasil yang ditemukan

Characteristics of the Normal Distribution

8

T

he most common probability distribution is the normal distribution.

The normal distribution describes a great many phenomena: people’s heights and weights, scores on numerous tests of physical dexterity and mental c apability, psychological attributes, and so forth. Given this great range of phenomena, the normal probability distribution is very useful to managers in public and nonprofi t organizations. Th is chapter elaborates the characteris- tics of the normal distribution and the calculation and use of standard normal scores or “z scores.” It explains how to use the normal distribution table at the end of this book (Table 1) to determine and interpret probabilities and discusses the application of the normal distribution to problems of public and nonprofi t administration.

Figure 8.1 The Normal Distribution

m

.5 .5

Th e normal curve characterizes such continuous variables—provided that their distribution of scores takes on a certain shape. Th e general “bell shape” is well known and probably familiar to you. It is shown in Figure 8.1.

As Figure 8.1 illustrates, the normal curve is a mounded distribution cen- tered around a single peak at the mean. Th e curve is perfectly symmetric: Scores above and below the mean are equally likely to occur so that half of the prob- ability under the curve (.5) lies above the mean and half (.5) below. Most values in a normal distribution fall close to the mean m, and the curve tapers off gradu- ally toward both of its ends or “tails.” As values fall at a greater distance from the mean in either direction, the probability of their occurrence grows smaller and smaller. With respect to heights, weights, or other characteristics that have a normal distribution, most individuals cluster about the mean, and fewer and fewer individuals are found the farther one moves away from the central value in either direction. As shown in Figure 8.1, the normal curve does not touch the horizontal axis, which would indicate that a score at that point and beyond has zero probability of occurring. It is always possible, if not very likely, that a highly extreme value of a variable could occur. Nevertheless, as scores become more and more extreme, the probability of their occurrence becomes vanish- ingly small.

The normal curve is completely determined by its mean m and standard deviation s (Chapters 5 and 6 discuss these statistics in detail). As shown in Figure 8.1, the height of the curve is greatest at the mean (where the probability of occurrence is highest). Figure 8.2 shows that the standard deviation governs the clustering of the data about this central value. In a normal distribution, ap- proximately 68.26% of all values fall within one standard deviation of the mean in

either direction, approximately 95.44% of all values fall within two standard devia- tions of the mean, and approximately 99.72% of all values fall within three standard deviations of the mean in either direction. Th e shaded areas in Figures 8.2(a), (b), and (c) illustrate this phenomenon. Conversely, as shown by the unshaded areas in the fi gure, in a normal distribution about one-third of the data values lie beyond one standard deviation of the mean in either direction (1.0 2 .6826 5 .3174).

Only about 5% of the data values lie beyond two standard deviations of the mean in either direction (1.0 2 .9544 5 .0456), and only about one-quarter of 1% of all values lie beyond three standard deviations (1.0 2.9972 5 .0028).

Figure 8.2 The Normal Distribution and the Standard Deviation

m

(a) 68.26% of all values lie within one standard deviation of the mean m2s m1s

m2s m

m22s m12s

m m22s

m23s m12s m13s m1s

(b) 95.44% of all values lie within two standard deviations of the mean

(c) 99.72% of all values lie within three standard deviations of the mean m2s m1s

Th e normal distribution is measured in terms of its standard deviation. Given a normal distribution, we can use the curve to fi nd the probability of a data value falling within any number of standard deviations from the mean (although the probability of a data value exceeding three standard deviations is very small). If all the public or nonprofi t manager had to be concerned about was the probabil- ity of scores falling within one, two, or three standard deviations from the mean, or beyond each of these limits, this would be a very short chapter (and many statistical consultants would need to fi nd other employment). Figure 8.2 displays these probabilities only; that is, for data values falling exactly one, two, or three standard deviations from the mean.

Th e manager has a much broader range of needs and interests for the normal curve, however. She or he is often interested in identifying the data values that demarcate the top 5% of applicants, or the bottom quarter of the work group, or the middle 75% of performers. For example, what score on the innovation inventory should be used to select the top 10% of nominees for the Municipal Innovation Award? If the nonprofi t coordinating council wants to recognize the top 2% of nonprofi t social service agencies in the state with respect to number of clients served, what numerical cutoff should it use? Or the manager may need to convert actual scores into their probability of occurrence. For instance, if scores on a test of typing profi ciency have a normal distribution, what percentage of the applicant pool scored between the mean and a particular score of interest (say, the score of an applicant who performed well in other aspects of the job interview), or surpassed this score, or fell below it? If one nonprofi t social service agency seemed to serve very few clients, what percentage of the agencies in the state did better—or worse? In what percentile did this applicant fall? Although none of these questions refers to data values that fall exactly one, two, or three standard deviations from the mean, the normal distribution can be used to an- swer them—as well as many other questions important for public and nonprofi t management.

z Scores and the Normal Distribution Table

Th e way that the manager goes about addressing these questions is to calculate, and interpret, a z score or standard normal score. A z score is simply the number of standard deviations a score of interest lies from the mean of the normal distri- bution. Because the normal distribution is measured with respect to the standard deviation, one can use the z score to convert raw data values into their associated probabilities of occurrence with reference to the mean.

Let’s call the score we are interested in X. To calculate the associated z score, fi rst subtract the mean m from the score of interest X, and then divide by the standard deviation s to determine how many standard deviations this score is from the mean. Written as a formula, this is

z5X2 m s

As the formula illustrates, the z score is the number of standard deviations (s) a score of interest (X) is from the mean (m) in a normal distribution. Accord- ing to the formula (and the defi nition of the z score), a data value X exactly one standard deviation above the mean will have a z score of 1.0, a value two standard deviations above the mean will have a z score of 2.0, and a value three standard deviations above the mean will have a z score of 3.0. Referring to Figure 8.2, we see that the probability associated with a z score of 1.0 is .3413—that is, in a nor- mal distribution, just over one-third of the data values lie between the mean and a value one standard deviation above it. Th e respective z score for a data value one standard deviation below the mean will be the same in magnitude but negative in sign, or 21.0. (Note that for scores of interest below or less than the mean in value, X 2 m will be a negative number.)

Because the normal curve is symmetric about the mean (half the probability lies above and half the probability below the mean; see Figure 8.1), the nega- tive sign poses no diffi culty: Th e probability associated with a z score of 21.0 is also .3413 (again, just over one-third of the data values fall between the mean and one standard deviation below it). In a normal distribution, then, .6826 of the data values lie within one standard deviation of the mean in either direc- tion (.3413 1 .3413), as shown in Figure 8.2(a). Th e associated probability for a z score of 2.0 (or 22.0) is .4772, and for a z score of 3.0 (or 23.0) the probability is .4986. Doubling these probabilities to take into account scores be- low the mean and above the mean yields the aggregate probabilities shown in Figures 8.2(b) and 8.2(c) of .9544 and .9972 for data values within two or three standard deviations from the mean, respectively.

How does the manager deal with the myriad of other z scores that she or he will encounter in everyday work life in public or nonprofi t administration? It would be impractical to have a graph or picture for each z score, as in Figure 8.2.

Instead, for convenience, z scores are tabulated with their associated probabilities in a normal distribution table; the table appears as Table 1 in the Statistical Tables at the end of this book. As illustrated in Figure 8.3, the normal distribution table displays the percentage of data values falling between the mean m and each z score, the area shaded in the fi gure. In the normal table (Table 1), the fi rst two digits of the z score appear in the far left column; the third digit is read along the top row of the table. Th e associated probability is read at the intersection of these two points in the body of the table.

Refer to Table 1. In a normal distribution, what percentage of the data values lie between the mean m and a z score of 1.33? Th at question is the same as ask- ing what percentage of the cases lie between the mean and a point 1.33 standard deviations away (that is, one and one-third standard deviations). In the far left column of the table, locate the fi rst two digits of the z score, 1.3, and read along this row until you reach the third digit, in the 0.03 column. Th e probability is .4082. In a normal distribution, 40.82% of the data values lie between the mean and a z score of 1.33.

Given this result, we can use our knowledge of the normal distribution to answer several questions of interest. For instance, what percentage of the data

v alues lie between the mean and a z score 1.33 standard deviations below it?

A negative z score, in this case 21.33, indicates a score of interest less than the mean. Because the normal curve is symmetric, this z score bounds the same amount of area (probability) under the normal curve as a z score of 1.33. Th e probability is identical: .4082.

What percentage of the data values lies above a z score of 1.33? Th is area of the normal curve is shaded in Figure 8.4. How can we fi nd this probability?

Recall that in a normal distribution, half (.5) of the probability lies above the mean and half below. We know that .4082 of all the values lie between the mean and a z score of 1.33. Th e shaded area beyond this z score can be found by sub- tracting this probability from .5, giving an answer of .0918. So 9.18% of the data values lie above this score.

In a normal distribution, what percentage of the data values lie below a z score of 1.33? This portion of the curve is shaded in Figure 8.5. We can proceed in either of two ways. Th e easier is to subtract from 1.0 (the total probability under the curve) the percentage of data values surpassing this z score (.0918); the answer is .9082. Or, because we know that the area under the curve between the mean and this z score is .4082 and that the area below the mean is equal to .5, we can add these probabilities to find the same total probability of .9082.

In the normal distribution table at the end of the book (Table 1), practice fi nding the probabilities associated with the following z scores. Th en, using the methods discussed earlier, calculate the percentage of data values in a normal distribution that falls above this z score and the percentage that falls below it.

Figure 8.3 Probability Between a z Score and the Mean

m z

Figure 8.4 Probability Greater Than a z Score

m z

Figure 8.5 Probability Less Than a z Score

m z

z Table Probability Percentage above z Percentage below z

1.62 0.73 2.40 2 1.50 2 0.48 2 3.16

For further practice in using the normal distribution table, look up the prob- ability associated with a z score of 1.0; that is, a score one standard deviation from the mean. For this z score the table displays the now familiar probability of .3413. Now, look up the probability for a z score of 2.0 (two standard deviations from the mean). No surprise here either: Th e probability is the equally familiar .4772. Th e probability for a z score of 3.0 also confi rms our earlier u nderstanding.

Th e table shows that .4986 of the data values lie between the mean and a score three standard deviations above it.

Applications to Public and Nonprofi t