• Tidak ada hasil yang ditemukan

We’ve seen that the mode is the value that occurs most frequently. It describes a typical observation in terms of the most common outcome. The concept of the mode is most often used to describe the category of a categorical variable that has the highest frequency (the modal category). With quantitative variables, the mode is most useful with discrete variables taking a small number of possible values. For instance, for the TV-watching data of Example 6, the mode is 2 hours of daily watching. For continuous observations, it is usually not meaningful to look for a mode because there can be multiple modes or no mode at all. For the CO2 data, for example, there is no mode. All values just occurred once.

Caution

The mode need not be near the center of the distribution. It may be the largest or the smallest value. Thus, it is somewhat inaccurate to call the mode a measure of center, but often it is useful to report the most common outcome. b

c Activity 1

The median is resistant. The mean is not.

From these properties, you might think that it’s always better to use the me- dian rather than the mean. That’s not true. The mean has other useful properties that we’ll learn about and take advantage of in later chapters.

In practice, it is a good idea to report both the mean and the median when describing the center of a distribution. For the CO2 emission data in Example 11, the median is the more relevant statistic because of the skew resulting from the extremely large value for the United States. However, knowing the mean for all the observations and then for the observations excluding the United States (in which case, the mean is 3.0) provides additional information.

your own dotplot by choosing the Create Own option from the drop-down menu.

jUnder a skewed initial distribution and 100 points (i.e., a large sample size), does deleting one or two outliers have much of an effect on the mean or median?

jYou can also supply your own data points. Select this option from the first drop-down menu. By default, the per capita CO2 emissions for the 9 countries mentioned in Example 11 are shown. Investigate what happens when you change the largest observation from 16.9 to 90? (In the textbox showing the numerical value, replace 16.9 by 90) Does the median change much from its original value of 1.8? What about the mean?

*For more information about the apps see pages 7–8 and the back endpapers of this book.

Try Exercises 2.38 and 2.148 b

Using an App to Explore the Relationship Between the Mean and Median*

The “Mean Versus Median” web app accessible via the book’s website allows you to add and delete data points from a sample.

When you access the app, select Skewed from the drop-down menu for the initial distribution. A dotplot with 20 observations from a skewed distribution is shown.

jObserve the location of the mean and median as you change the skewness from “right” to “symmetric ” to “left”.

jClick in the gray area to add points or click on a point to delete it, each time observing how the mean and median changes. Try this for different initial distributions or create

82 Chapter 2 Exploring Data with Graphs and Numerical Summaries

2.3 Practicing the Basics

2.29 Median versus mean For each of the following variables, would you use the median or mean for describing the center of the distribution? Why? (Think about the likely shape of the distribution.)

a. Salary of employees of a university b. Time spent on a difficult exam c. Scores on a standardized test

2.30 More median versus mean For each of the following vari- ables, would you use the median or mean for describing the center of the distribution? Why? (Think about the likely shape of the distribution.)

a. Amount of liquid in bottles of capacity one liter b. The salary of all the employees in a company c. Number of requests to reset passwords for individual

email accounts.

2.31 More on CO2 emissions The Energy Information Agency reported the CO2 emissions (measured in gigatons, Gt) from fossil fuel combustion for the top 10 emitting coun- tries in 2011. These are China (8 Gt), the United States (5.3 Gt), India (1.8 Gt), Russia (1.7 Gt), Japan (1.2 Gt), Germany (0.8 Gt), Korea (0.6 Gt), Canada(0.5 Gt), Iran (0.4 Gt), and Saudi Arabia (0.4 Gt).

a. Find the mean and median CO2 emission.

b. The totals reported here do not take into account a nation’s population size. Explain why it may be more sensible to analyze per capita values, as was done in Example 11.

2.32 Resistance to an outlier Consider the following three sets of observations:

Set 1: 8, 9, 10, 11, 12 Set 2: 8, 9, 10, 11, 100 Set 3: 8, 9, 10, 11, 1000

a. Find the median for each data set.

b. Find the mean for each data set.

c. What do these data sets illustrate about the resistance of the median and mean?

2.33 Weekly earnings and gender In New Zealand, the mean and median weekly earnings for males in 2009 was $993 and $870, respectively and for females, the mean and median weekly earnings were $683 and $625, respectively (www.nzdotstat.stats.govt.nz). Does this suggest that the distribution of weekly earnings for males is symmetric, skewed to the right, or skewed to the left?

What about the distribution of weekly earnings for females? Explain.

2.34 Labor dispute The workers and the management of a company are having a labor dispute. Explain why the workers might use the median income of all the employ- ees to justify a raise but management might use the mean income to argue that a raise is not needed.

2.35 Cereal sodium The dot plot shows the cereal sodium values from Example 4. What aspect of the distribution causes the mean to be less than the median?

0 50

median = 180 mean = 167

100 150

Sodium (mg)

Dot Plot of Sodium Values for 20 Breakfast Cereals

250

200 300 350

2.36 Center of plots The figure shows dot plots for three sam- ple data sets.

a. For which, if any, data sets would you expect the mean and the median to be the same? Explain why.

b. For which, if any, data sets would you expect the mean and the median to differ? Which would be larger, the mean or the median? Why?

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

2.37 Public transportation—center The owner of a company in downtown Atlanta is concerned about the large use of gasoline by her employees due to urban sprawl, traffic congestion, and the use of energy-inefficient vehicles such as SUVs. She’d like to promote the use of public trans- portation. She decides to investigate how many miles her employees travel on public transportation during a typical day. The values for her 10 employees (recorded to the closest mile) are

0 0 4 0 0 0 10 0 6 0 a. Find and interpret the mean, median, and mode.

b. She has just hired an additional employee. He lives in a different city and travels 90 miles a day on public trans- port. Recompute the mean and median. Describe the effect of this outlier.

2.38 Public transportation—outlier Refer to the previous exercise.

a. Use the Mean Versus Median app (see Activity 1) to investigate what effect adding the outlier of 90 to the data set has on the mean and median. (In the app, select “Supply own sample” and type the data plus the value 90 into the textbox.)

b. Now add 10 more data values that are near the mean of 2 for the original 10 observations. Does the outlier of 90 still have such a strong effect on the mean?

Section 2.3 Measuring the Center of Quantitative Data 83

Number of Times Married, for Subjects of Age 20–24 Frequency

Number Times Married Women Men

0 7350 8418

1 2587 1594

2 80 10

Total 10,017 10,022

a. Find the median and mean for each gender.

b. On average, have women or men been married more often? Which statistic do you prefer to answer this question? (The mean, as opposed to the median, uses the numerical values of all the observations, not just the ordering. For discrete data with only a few values such as the number of times married, it can be more informative.)

2.44 Knowing homicide victims The table summarizes responses of 4383 subjects in a recent General Social Survey to the question, “Within the past 12 months, how many people have you known personally that were victims of homicide?”

Number of People You Have Known Who Were Victims of Homicide Number of Victims Frequency

0 3944

1 279

2 97

3 40

4 or more 23

Total 4383

(Source: Data from CSM, UC Berkeley.)

a. To find the mean, it is necessary to give a score to the

“4 or more” category. Find it, using the score 4.5. (In practice, you might try a few different scores, such as 4, 4.5, 5, 6, to make sure the resulting mean is not highly sensitive to that choice.)

b. Find the median. Note that the “4 or more” category is not problematic for it.

c. If 1744 observations shift from 0 to 4 or more, how do the mean and median change?

d. Why is the median the same for parts b and c, even though the data are so different?

2.45 Airplane crashes One variable in a study measures how many airplane crashes a commercial airline company has had in the past year.

a. Calculate the expected value of the mode for this variable.

b. Explain why the mean would likely be more useful than the median for summarizing the responses of 60 airline companies.

2.39 Sale price of houses According to the U.S. Census Bureau, houses in 2014 had a median sales price of

$282,800 and a mean sales price of $345,800 (www.census.

gov/construction/nrs/pdf/uspriceann.pdf). What do you think causes these two values to be so different?

2.40 More baseball salaries Go to espn.go.com/mlb/teams and select a (or your favorite) team. Click Roster and then Salary. Copy the salary figures for the players into a software program and create a histogram. Describe the shape of the distribution for salary and comment on its center by quoting appropriate statistics.

2.41 European fertility The European fertility rates (mean number of children per adult woman) from Exercise 2.18 are shown again in the table.

a. Find the median of the fertility rates. Interpret.

b. Find the mean of the fertility rates. Interpret.

c. For each woman, the number of children is a whole number, such as 2 or 3. Explain why it makes sense to measure a mean number of children per adult woman (which is not a whole number) to compare fertility levels, such as the fertility levels of 1.5 in Canada and 2.4 in Mexico.

Country Fertility Country Fertility

Austria 1.4 Netherlands 1.7

Belgium 1.7 Norway 1.8

Denmark 1.8 Spain 1.3

Finland 1.7 Sweden 1.6

France 1.9 Switzerland 1.4

Germany 1.3 United Kingdom 1.7

Greece 1.3 United States 2.0

Ireland 1.9 Canada 1.5

Italy 1.3 Mexico 2.4

2.42 Dining out A recent survey asked students, “On aver- age, how many times in a week do you go to a restaurant for dinner?” Of the 570 respondents, 84 said they do not go out for dinner, 290 said once, 100 said twice, 46 said thrice, 30 said 4 times, 13 said 5 times, 5 said 6 times, and 2 said 7 times.

a. Display the data in a table. Explain why the median is 1.

b. Show that the mean is 1.5.

c. Suppose the 84 students who said that they did not go out for dinner had answered 7 times instead. Show that the median would still be 1. (The mean would increase to 2.54. The mean uses the numerical values of the ob- servations, not just their ordering.)

2.43 Marriage statistics for 20–24-year-olds The table in the next column shows the number of times 20–24-year-old U.S. residents have been married, based on a Bureau of the Census report from 2004. The frequencies are actually thousands of people. For instance, 8,418,000 men never married, but this does not affect calculations about the mean or median.

84 Chapter 2 Exploring Data with Graphs and Numerical Summaries

2.4 Measuring the Variability of Quantitative Data

A measure of the center is not enough to describe a distribution for a quantitative variable adequately. It tells us nothing about the variability of the data. With the cereal sodium data, if we report the mean of 167 mg to describe the center, would the value of 210 mg for Honeycomb be considered quite high, or are most of the data even farther from the mean? To answer this question, we need numerical summaries of the variability of the distribution.