• Tidak ada hasil yang ditemukan

Shape of a Frequency Distribution and Measures of Central Tendency

root sign completes Step 5, dividing by the number of items (N). Finally, the square root sign is Step 6.

How would the calculations change if the fi ve street-cleaning crews were a sample of the crews from Normal, Oklahoma, rather than the entire population?

You would then need to calculate the sample standard deviation, denoted by the symbol s. To do so, you would follow the steps in the formula for the standard deviation, replacing m with X , the sample mean, and N with n 2 1, the number of cases in the sample (n) less 1. Th ere are sound statistical reasons for making this adjustment, which we will discuss in Chapter 11, on statistical inference.

Th ere are also good practical reasons. Dividing by n 2 1 rather than n will yield a slightly larger standard deviation, which is the appropriate measure of care to take when you are using a sample of data to represent the population. In this example, the sample standard deviation would be 16.1 (1,036 4 4 5 259;

the square root of 259 5 16.1). When you are working with the population rather than a sample, there is no need to add this extra measure of caution. Th us, if the fi ve Normal, Oklahoma, street-cleaning crews constitute the population, the (population) standard deviation is smaller, 14.4. You need to be aware that statistical package programs, spreadsheets, and hand calculators may assume that you are using the sample standard deviation rather than the population standard deviation and thus make the appropriate adjustment.

Th e smaller the standard deviation, the more closely the data cluster about the mean. For example, the standard deviations for the Wheezer, South Dakota, police arrests are 1.34 for 2009, 1.55 for 2010, and 1.16 for 2011 (see Figure 6.1). Th is calculation reinforces our perception that the 2010 data were the most dispersed and the 2011 data were the least dispersed. In general, when the data are closely bunched around the mean (smaller standard deviation), the public or nonprofi t manager will feel more comfortable making a decision based on the mean.

Shape of a Frequency Distribution

the mean, median, and mode have very similar values at or near the center of the distribution. In this situation, the mean is the preferred measure of central tendency.

Figure 6.3 shows a uniform distribution. In a uniform distribution, each data value occurs with the same (or nearly the same) frequency. Because the data do not cluster around the middle or center of the distribution but are evenly spread across the variable, the dispersion, as measured by the stan- dard deviation, will be large. Th e mean and median will be near the center of this distribution. However, because many data values occur with the same

Figure 6.2 A Symmetric Distribution

Frequency

Values of Variable

Figure 6.3 A Uniform Distribution

Frequency

Values of Variable

or nearly the same frequency, the mode is generally not very useful in a uniform distribution.

Chapter 5 introduced the idea of a bimodal distribution. Th e shape of such a distribution has two distinct peaks, corresponding to data values that occur with high frequency, separated by other values that occur much less of- ten. Figure 6.4 presents an example of a bimodal distribution. Note that in a bimodal distribution, the mean and median fall near the center of the distribu- tion, but relatively few cases fall there (note the low frequency). As a result, in a bimodal distribution they are generally poor measures of central tendency.

By contrast, consider the mode(s). Th e two modes (bimodal) capture the es- sence of the distibution and are preferred as measures of central tendency in this situation.

In an asymmetric distribution, the data fall more on one side of the center or middle than on the other side. In that case, skewness exists in the data.

Negatively skewed data have a few extremely low numbers that distort the mean. Negatively skewed data form a frequency distribution like the one pictured in Figure 6.5.

Positively skewed data have a few very large numbers that distort the mean.

Th e frequency distribution of positively skewed data resembles the one shown in Figure 6.6.

If data are strongly skewed, the mean is not a good measure of central tendency (see Chapter 5). Th e reason is that the mean is “pulled,” or skewed, in the direction of the skewness away from the center of the distribution. A few very high or low values in a very asymmetric distribution will skew the mean. In that case, the median is the preferred measure of central tendency because it is basically unaff ected by skewness. Recall that in calculating the median, the relative positions of the values are what matters, rather than the actual magnitudes.

Figure 6.4 A Bimodal Distribution

Frequency

Values of Variable

Recall our discussion in Chapter 5 regarding the use of the median as a measure of central tendency. If the standard deviation for a set of data approaches or exceeds the value of the mean itself, the mean may not be a very representa- tive measure of central tendency. When the standard deviation exceeds the mean, extreme values in the data are often to blame. Consider using the median in cases like this, because it is not as sensitive to extreme values in the data.

A measure of “skewness” does exist and should be calculated by computer.

Positive values of skewness indicate that the data are positively skewed and, therefore, the mean is artifi cially high. Skewness fi gures around zero indicate an unskewed (or symmetric) distribution. Negative numbers indicate negative skew- ness and, therefore, a mean that is artifi cially low.

Figure 6.5 Negatively Skewed Data

Frequency

Values of Variable

Figure 6.6 Positively Skewed Data

Frequency

Values of Variable

Using Measures of Dispersion