• Tidak ada hasil yang ditemukan

The range is the difference between the largest value and the smallest value in a distribution of data. Th e interquartile deviation is the diff erence between two values in the distribution selected so that the middle 50% of all observations fall between them. Th e average deviation is the average diff erence between the mean and all other values. If these measures of dispersion seem appropriate for any project, consult a general statistics book to fi nd out how to calculate and use them [for example, see the text by Johnson and Kuby (2006);

a list of other suitable texts is given in the annotated bibliography at the end of the book].

cleaned by the fi ve crews. To determine the (population) standard deviation for these data, follow these steps:

Step 1: Calculate the mean from the data. Recall from Chapter 5 that to cal- culate the mean, you need to add the data values across the cases (here, the sum of the data values is 665 blocks) and divide that sum by the number of cases or N (here, the number of work crews, 5). Th e mean is 133.

Step 2: Subtract the mean from every item in the data set. In this situation, subtract 133 from the number of streets cleaned by each crew. Often a table like Table 6.3 is helpful in performing these calculations. Note that the sum of these diff erences equals zero. In fact, in any distribution of data, the sum of the diff erences of the data values from the mean will always equal zero. Th e reason is that the positive deviations above the mean will just balance the negative deviations below the mean such that the sum of the deviations is zero. Because the sum of the deviations is zero, we must adjust our calculations to derive a more useful measure- ment of dispersion in Step 3.

Step 3: Square the diff erence between each data value and the mean (the third column in Table 6.3). As noted in Step 2, the diff erences sum to zero,

Table 6.2 Blocks of Streets Cleaned by Work Crews

Work Crew Number of Blocks

A 126

B 140

C 153

D 110

E 136

665

Table 6.3 Calculating the Differences

Number of Blocks Subtract the Mean Diff erence

126 133 –7

140 133 7

153 133 20

110 133 –23

136 133 3

0

regardless of how condensed or dispersed the data values are about the mean. Th e reason is that the positive and negative diff erences will always balance each other out. Squaring the diff erences avoids this problem and is instrumental in measuring the actual amount of dispersion in the data (see Table 6.4).

Step 4: Sum the squared diff erences. You should get a sum of 1,036.

Step 5: Divide the sum by the number of items (N 5 5). Th is number (207.2) is called the variance. The variance is the arithmetic average of the squared diff erences of the data values from the mean. Th e variance has little descriptive value because it is based on squared (not actual) units of measurement.

Step 6: Take the square root of the variance to fi nd the standard deviation (here, the standard deviation is 14.4). Because we squared the diff erences be- tween the mean and the data values in Step 3 (so that the diff erences would not sum to zero), it now makes sense to take the square root. In this manner, the standard deviation converts the variance from squared units to the original units of measurement. Th e standard deviation is the preferred measure of dispersion for descriptive purposes.

After calculating this relatively simple statistic, you should not be surprised to learn that statisticians have a complex formula for the standard deviation. Th e formula for s (the Greek letter sigma, which statisticians use as the symbol for the standard deviation of a population) is

s 5ã a

N i51

1Xi2 m22 N

Th is formula is not as formidable as it seems: (Xi 2 m) is Step 2, subtracting the mean of the population from each data value. (Xi 2 m)2 is Step 3, squaring the diff erences. SNi511Xi2 m22 is Step 4, summing all the squared diff erences from the fi rst case i 5 1 to the last case i 5 N. Th e entire formula within the square

Table 6.4 Calculating the Difference Squared

Blocks Mean Diff erence Diff erence Squared

126 133 –7 49

140 133 7 49

153 133 20 400

110 133 –23 529

136 133 3 9

root sign completes Step 5, dividing by the number of items (N). Finally, the square root sign is Step 6.

How would the calculations change if the fi ve street-cleaning crews were a sample of the crews from Normal, Oklahoma, rather than the entire population?

You would then need to calculate the sample standard deviation, denoted by the symbol s. To do so, you would follow the steps in the formula for the standard deviation, replacing m with X , the sample mean, and N with n 2 1, the number of cases in the sample (n) less 1. Th ere are sound statistical reasons for making this adjustment, which we will discuss in Chapter 11, on statistical inference.

Th ere are also good practical reasons. Dividing by n 2 1 rather than n will yield a slightly larger standard deviation, which is the appropriate measure of care to take when you are using a sample of data to represent the population. In this example, the sample standard deviation would be 16.1 (1,036 4 4 5 259;

the square root of 259 5 16.1). When you are working with the population rather than a sample, there is no need to add this extra measure of caution. Th us, if the fi ve Normal, Oklahoma, street-cleaning crews constitute the population, the (population) standard deviation is smaller, 14.4. You need to be aware that statistical package programs, spreadsheets, and hand calculators may assume that you are using the sample standard deviation rather than the population standard deviation and thus make the appropriate adjustment.

Th e smaller the standard deviation, the more closely the data cluster about the mean. For example, the standard deviations for the Wheezer, South Dakota, police arrests are 1.34 for 2009, 1.55 for 2010, and 1.16 for 2011 (see Figure 6.1). Th is calculation reinforces our perception that the 2010 data were the most dispersed and the 2011 data were the least dispersed. In general, when the data are closely bunched around the mean (smaller standard deviation), the public or nonprofi t manager will feel more comfortable making a decision based on the mean.

Shape of a Frequency Distribution