• Tidak ada hasil yang ditemukan

With a dot plot or a stem-and-leaf plot, it’s easy to reconstruct the original data set because the plot shows the individual observations. This becomes unwieldy for large data sets. In that case, a histogram is a more versatile way to graph the data and picture the distribution. It uses bars to display and summarize frequen- cies of different outcomes.

Histogram

A histogram is a graph that uses bars to portray the frequencies or the relative frequen- cies of the possible outcomes for a quantitative variable.

Section 2.2 Graphical Summaries of Data 65

Questions to Explore

a. What was the most common outcome?

b. What does the histogram reveal about the distribution of TV watching?

c. What percentage of people reported watching TV no more than 2 hours per day?

Think It Through

a. The most common outcome is the value with the highest bar. This is 2 hours of TV watching. We call the most common value of a quantitative variable the mode. The distribution of TV watching has a mode at 2 hours.

b. We see that most people watch between 1 and 4 hours of TV per day.

Very few watch more than 8 hours.

c. To find the percentage for “no more than 2 hours per day,” we need to look at the percentages for 0, 1, and 2 hours per day. They seem to be about 7, 20, and 25. Adding these percentages together tells us that about 52% of the respondents reported watching no more than 2 hours of TV per day.

Insight

In theory, TV watching is a continuous variable. However, the possible responses subjects were able to make here were 0, 1, 2, . . . , 24, so it was measured as a discrete variable. Figure 2.5 is an example of a histogram of

Histogram b

TV Watching

Picture the Scenario

The 2012 General Social Survey asked, “On an average day, about how many hours do you personally watch television?” Figure 2.5 shows the histogram of the 1298 responses.

Example 6

Hours Count

0 90

1 255

2 325

3 238

4 171

5 61

6 58

7 19

8 31

9 3

10 17

11 0

12 11

Hours Count

13 2

14 4

15 3

16 1

17 0

18 1

19 0

20 2

21 0

22 1

23 0

24 5

Frequency Table for Histogram in Figure 2.5

Hours of Watching TV

Number of Hours of Watching TV Per Day

Percent (%)

0 5 10 15 20 25

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

mFigure 2.5 Histogram of GSS Responses about Number of Hours Spent Watching TV on an Average Day. Source: Data from CSM, UC Berkeley.

66 Chapter 2 Exploring Data with Graphs and Numerical Summaries

Hours of Watching TV

Number of Hours of Watching TV Per Day

Percent (%)

0 5 10 15 20 25

0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324

a discrete variable. Note that since the variable is treated as discrete, the histogram in Figure 2.5 could have been constructed with the bars apart in- stead of beside each other as shown in the margin. Also, although it is easy to calculate the mode, we usually use a different statistic to describe a typical value for a quantitative variable.(Here, we would use the median, which also equals 2; see Section 2.3.).

c Try Exercise 2.24, parts a and b

Caution

The term histogram is used for a graph with bars representing a quantitative variable.

The term bar graph is used for a graph with bars representing a categorical variable. b

For a discrete variable, a histogram usually has a separate bar for each possible value. For a continuous variable, we need to divide the range of possible values into smaller intervals of equal width, just as we have discussed when forming the frequency table for a continuous response. We can also do this when a discrete variable, such as the score on an exam, has a large number of possible values. We then count the number of observations falling in each interval. The height of the bars in the histogram represent the frequency (or relative frequency) of observa- tions falling in the intervals.

SUMMARY: Steps for Constructing a Histogram

j Divide the range of the data into intervals of equal width. For a discrete variable with few values, use the actual possible values.

j Count the number of observations (the frequency) in each interval, forming a frequency table.

j On the horizontal axis, label the values or the endpoints of the intervals. Draw a bar over each value or interval with height equal to its frequency (or percentage), values of which are marked on the vertical axis.

Histogram b

Health Value of Cereals

Picture the Scenario

Let’s reexamine the sodium values of the 20 breakfast cereals. Those values are shown again in the margin of the next page.

Questions to Explore

a. Construct a frequency table.

b. Construct a corresponding histogram to visualize the distribution.

c. What information does the histogram not show that you can get from a dot plot or a stem-and-leaf plot?

Think It Through

a. To construct a frequency table, we divide the range of possible sodium values into separate intervals and count the number of cereals in each.

The sodium values range from 0 to 340. We created Table 2.4 using nine intervals, each with a width of 40. With the interval labels shown in the table, 0 to 39 actually represents 0 to 39.999999 . . . , that is, 0 up to every number below 40. So, 0 to 39 is then shorthand for “0 to less than 40.”

Example 7

Section 2.2 Graphical Summaries of Data 67

Cereal Sodium

Frosted Mini Wheats 0

Raisin Bran 340

All Bran 70

Apple Jacks 140

Cap’n Crunch 200

Cheerios 180

Cinnamon Toast Crunch 210

Crackling Oat Bran 150

Fiber One 100

Frosted Flakes 130

Froot Loops 140

Honey Bunches of Oats 180

Honey Nut Cheerios 190

Life 160

Rice Krispies 290

Honey Smacks 50

Special K 220

Wheaties 180

Corn Flakes 200

Honeycomb 210

TI output of histogram

Table 2.4 Frequency Table for Sodium in 20 Breakfast Cereals

The table summarizes the sodium values using nine intervals and lists the number of observations in each as well as the corresponding proportions and percentages.

Interval Frequency Proportion Percentage

0 to 39 1 0.05 5%

40 to 79 2 0.10 10%

80 to 119 1 0.05 5%

120 to 159 4 0.20 20%

160 to 199 5 0.25 25%

200 to 239 5 0.25 25%

240 to 279 0 0.00 0%

280 to 319 1 0.05 5%

320 to 359 1 0.05 5%

7 6 5 4 3 2 1 0

0 40 80 120 160 200 240 280 320 360

Frequency

Sodium (mg)

mFigure 2.6 Histogram of Breakfast Cereal Sodium Values. The rectangular bar over an interval has height equal to the number of observations in the interval.

Sometimes you will see the intervals written as 0 to 40, 40 to 80, 80 to 120, and so on. However, for an observation that falls at an interval endpoint, then it’s not clear in which interval it goes. When reading the histogram, we generally use a left endpoint convention where if an ob- servation falls on an endpoint, it belongs to the interval with the observa- tion as the left endpoint.

b. Figure 2.6 shows the histogram for this frequency table. A bar is drawn over each interval of values, with the height of each bar equal to its corresponding frequency. The histogram created using a TI cal- culator is in the margin.

c. The histogram does not show the actual numerical values. For in- stance, we know that one observation falls below 40, but we do not know its actual value. In summary, with a histogram, we may lose the actual numerical values of individual observations, unlike with a dot plot or a stem-and-leaf plot.

Insight

The bars in the histogram in Figure 2.6 display the frequencies in each inter- val (or bin), and the vertical axis shows the counts. If we had used relative

68 Chapter 2 Exploring Data with Graphs and Numerical Summaries

How do you select the intervals? If you use too few intervals, the graph is usu- ally too crude (see margin). It may consist mainly of a couple of tall bars. If you use too many intervals, the graph may be irregular, with many very short bars and/or gaps between bars (see margin). You can then lose information about the shape of the distribution. Usually about 5 to 10 intervals are adequate, with per- haps additional intervals when the sample size is quite large. There is no one right way to select the intervals. Software can select them for you, find the counts and percentages, and construct the histogram but it is always a good idea to override the default and look at a few options.