• Tidak ada hasil yang ditemukan

Measuring Variability: The Standard Deviation

84 Chapter 2 Exploring Data with Graphs and Numerical Summaries

2.4 Measuring the Variability of Quantitative Data

A measure of the center is not enough to describe a distribution for a quantitative variable adequately. It tells us nothing about the variability of the data. With the cereal sodium data, if we report the mean of 167 mg to describe the center, would the value of 210 mg for Honeycomb be considered quite high, or are most of the data even farther from the mean? To answer this question, we need numerical summaries of the variability of the distribution.

Section 2.4 Measuring the Variability of Quantitative Data 85

j The deviation of an observation x from the mean xQ is 1x - xQ2, the difference between the observation and the sample mean.

For the cereal sodium values, the mean is xQ = 167. The observation of 210 for Honeycomb has a deviation of 210 - 167 = 43. The observation of 50 for Honey Smacks has a deviation of 50 - 167 = -117. Figure 2.11 shows these deviations.

x = 167

117 43

Honeycomb Honey Smacks

0 50 100 150 200 250 300 350

Sodium (mg)

mFigure 2.11 Dot Plot for Cereal Sodium Data, Showing Deviations for Two Observations. Question When is a deviation positive and when is it negative?

j Each observation has a deviation from the mean.

j A deviation x - xQ is positive when the observation falls above the mean.

A deviation is negative when the observation falls below the mean.

j The interpretation of the mean as the balance point implies that the positive deviations counterbalance the negative deviations. Because of this, the sum (and therefore the mean) of the deviations always equals zero, regardless of the actual data values. Hence, summary measures of variability from the mean use either the squared deviations or their absolute values.

j The average of the squared deviations is called the variance. Because the variance uses the square of the units of measurement for the original data, its square root is easier to interpret. This is called the standard deviation.

j The symbol Σ1x - xQ22 is called a sum of squares. It represents finding the de- viation for each observation, squaring each deviation, and then adding them.

Did You Know?

Another measure of the typical or average distance of observations from the mean is the Mean Absolute Deviation (MAD). The absolute value of the deviation is found instead of the square. Exercise 2.145 explores the MAD. b

The Standard Deviation s

The standard deviation s of n observations is

s = C

Σ1x -xQ22 n -1 = C

sum of squared deviations sample size -1 .

This is the square root of the variance s2, which is an average of the squares of the deviations from their mean,

s2 = Σ1x- xQ22 n-1 .

A calculator can compute the standard deviation s easily. Its interpretation is quite simple: Roughly, the standard deviation s represents a typical distance or a type of average distance of an observation from the mean. The most basic property of the standard deviation is this:

j The larger the standard deviation s, the greater the variability of the data.

A small technical point: You may wonder why the denominators of the variance and the standard deviation use n - 1 instead of n. We said that the variance was an average of the n squared deviations, so should we not divide by n? Basically it is because the deviations provide only n - 1 pieces of information about variabil- ity: That is, n - 1 of the deviations determine the last one, because the deviations

86 Chapter 2 Exploring Data with Graphs and Numerical Summaries

sum to 0. For example, suppose we have n = 2 observations and the first obser- vation has deviation 1x - xQ2 = 5. Then the second observation must have devia- tion 1x - xQ2 = -5 because the deviations must add to 0. With n = 2, there’s only n - 1 = 1 nonredundant piece of information about variability. And with n = 1, the standard deviation is undefined because with only one observation, it’s impos- sible to get a sense of how much the data vary.

Standard deviation b

Women’s and Men’s Ideal Number of Children

Picture the Scenario

Students in a class were asked on a questionnaire at the beginning of the course, “How many children do you think is ideal for a family?” The observa- tions, classified by student’s gender, were

Men: 0, 0, 0, 2, 4, 4, 4 Women: 0, 2, 2, 2, 2, 2, 4 Question to Explore

Both men and women have a mean of 2 and a range of 4. Do the distribu- tions of data have the same amount of variability around the mean? If not, which distribution has more variability?

Think It Through

Let’s check dot plots for the data.

Example 12

0 1 2 3 4

Women

0 1 2 3 4

Men Mean = 2

Mean = 2

The typical deviation from the mean for the male observations appears to be about 2. The observations for females mostly fall right at the mean, so their typical deviation is smaller.

Let’s calculate the standard deviation for men. Their observations are 0, 0, 0, 2, 4, 4, 4. The deviations and squared deviations about their mean of 2 are

Value Deviation Squared Deviation

0 10 -22 = -2 4

0 10 -22 = -2 4

0 10 -22 = -2 4

2 12 -22 = 0 0

4 14 -22 = 2 4

4 14 -22 = 2 4

4 14 -22 = 2 4

Section 2.4 Measuring the Variability of Quantitative Data 87

The sum of squared deviations equals

Σ1x - xQ22 = 4 + 4 + 4 + 0 + 4 + 4 + 4 = 24.

The standard deviation of these n = 7 observations equals s = C

Σ1x - xQ22

n - 1 = C

24

6 = 14 = 2.0.

This indicates that for men a typical distance of an observation from the mean is 2.0. By contrast, you can check that the standard deviation for women is s = 1.2. The observations for males tended to be farther from the mean than those for females, as indicated by s = 2.0 7 s = 1.2. In summary, the men’s observations varied more around the mean.

Insight

The standard deviation is more informative than the range. For these data, the standard deviation detects that the women were more consistent than the men in their viewpoints about the ideal number of children. The range does not detect the difference because it equals 4 for each gender.

c Try Exercise 2.46

In Practice Rounding

Statistical software and calculators can find the standard deviation s for you. Try calculating s for a couple of small data sets to help you understand what it represents. After that, rely on software or a calculator. To ensure accurate results, don’t round off while doing a calculation. (For example, use a calculator’s memory to store intermediate results.) When presenting the solution, however, round off to two or three significant digits. In calculating s for women, you get s = 11.3333c = 1.1547005c. Present the value s = 1.2 or

s = 1.15 to make it easier for a reader to comprehend.

Standard deviation b

Exam Scores

Picture the Scenario

The first exam in your statistics course is graded on a scale of 0 to 100.

Suppose that the mean score in your class is 80.

Question to Explore

Which value is most plausible for the standard deviation s: 0, 0.5, 10, or 50?

Think It Through

The standard deviation s is a typical distance of an observation from the mean. A value of s = 0 seems unlikely. For that to happen, every deviation would have to be 0. This implies that every student must have scored 80, the mean. A value of s = 0.5 is implausibly small because 0.5 would not be a typical distance above or below the mean score of 80. Similarly, a value of s = 50 is implausibly large because 50 would also not be a typical distance of a student’s score from the mean of 80. (For instance, it is impossible to score 130.) We would instead expect to see a value of s such as 10. With s = 10, a typical distance is 10, as occurs with the scores of 70 and 90.

Insight

In summary, we’ve learned that s is a typical distance of observations from the mean, larger values of s represent greater variability, and s = 0 means that all observations take the same value.

c Try Exercises 2.49 and 2.50

Example 13

Caution

The size of the standard deviation also depends on the units of measurement.

For instance, a value of s = 1,000 might not be considered large when the unit of measurement is millimeters instead of meters. s = 1,000 computed on the millimeter scale corresponds to s= 1 on the meter scale. b

88 Chapter 2 Exploring Data with Graphs and Numerical Summaries