2.1 Frequency Distributions and Histograms
|
Chapter 2 57EXAMPLE 2-2
Frequency Distribution for Qualitative Data
Automobile Accidents State Farm Insurance recently surveyed a sample of the records for 15 policy holders to determine the make of the vehicle driven by the eldest member in the household. The following data reflect the results for 15 of the respondents:
Ford Dodge Toyota Ford Buick
Chevy Toyota Nissan Ford Chevy
Ford Toyota Chevy BMW Honda
The frequency distribution for this qualitative variable is found as follows:
s t e p 1 List the possible values.
For these sample data, the possible values for the variable are BMW, Buick, Chevy, Dodge, Ford, Honda, Nissan, Toyota.
s t e p 2 Count the number of occurrences at each value.
The frequency distribution is
Car Company Frequency
BMW 1
Buick 1
Chevy 3
Dodge 1
Ford 4
Honda 1
Nissan 1
Toyota 3
Total = 15
TRY EXERCISE 2-7 (pg. 71)
BUSINESS APPLICATION
Frequency Distributions
Athletic Shoe Survey In recent years, a status symbol for many students has been the brand and style of athletic shoes they wear. Companies such as Nike and Adidas compete for the top position in the sport shoe market. A survey was conducted in which 100 college students at a southern state school were asked a number of questions, including how many pairs of Nike shoes they currently own. The data are in a file called SportsShoes.
The variable Number of Nike is a discrete quantitative variable. Figure 2.1 shows the fre- quency distribution (output from Excel) for the number of Nike shoes owned by those sur- veyed. The frequency distribution shows that, although a few people own more than six pairs of Nike shoes, most of those surveyed own two or fewer pairs.
Excel Tutorial
58 Chapter 2
|
Graphs, Charts, and Tables—Describing Your DataFIGURE 2.1 Excel 2016 Output—Nike Shoes Frequency Distribution Excel 2016 Instructions
1. Open File: SportsShoes.xlsx.
2. Enter the Possible Values for the Variable; i.e., 0, 1, 2, 3, 4, etc.
3. Select the cells to contain the Frequency values.
4. Select the Formulas tab.
5. Click on the fx button.
6. Select the Statistical- FREQUENCY function.
7. Enter the range of data and the bin range (the cells containing the possible number of shoes).
8. Press Ctrl-Shift-Enter to determine the frequency values.
BUSINESS APPLICATION
Grouped Data Frequency Distributions
Video Streaming Video streaming services such as Netflix, Cloudload, Amazon Prime, and Hulu have grown in popularity in recent years. A distribution manager for Netflix col- lected data for a random sample of 230 Netflix customers and recorded the number of movies they streamed during the previous month. Table 2.6 shows the results. These data are discrete, quantitative data. The values range from 0 to 30.
TABLE 2.6 Number of Streamed Movies
9 4 13 10 5 10 13 14 10 19
0 10 16 9 11 14 8 15 7 15
10 11 9 7 6 12 12 14 15 16
15 14 10 13 9 12 12 10 10 11
15 14 9 19 3 9 16 19 15 9
4 2 4 5 6 2 3 4 7 5
6 2 2 0 0 8 3 4 3 2
2 5 2 5 2 2 6 2 5 6
5 2 7 3 5 1 6 4 3 6
3 7 7 1 6 2 7 1 3 2
4 0 2 2 4 6 2 5 3 7
4 16 9 10 11 7 10 9 10 11
11 12 9 8 9 7 9 17 8 13
14 13 10 6 12 5 14 7 13 12
9 6 10 15 7 7 9 9 13 10
9 3 17 5 11 9 6 9 15 8
11 13 4 16 13 9 11 5 12 13
0 3 3 3 2 1 4 0 2 0
3 7 1 5 2 2 3 2 1 3
2 3 3 3 0 3 3 3 1 1
13 24 24 17 17 15 25 20 15 20
21 23 25 17 13 22 18 17 30 21
18 21 17 16 25 14 15 24 21 15
M02_GROE0383_10_GE_C02.indd 58 05/09/17 2:08 PM
2.1 Frequency Distributions and Histograms
|
Chapter 2 59TABLE 2.7 Frequency Distribution of Streamed Movies
Streamed Movies Frequency
0 8
1 8
2 22
3 22
4 11
5 13
6 12
7 14
8 5
9 19
10 14
11 9
12 8
13 12
14 8
15 12
16 6
17 7
18 2
19 3
20 2
21 4
22 1
23 1
24 3
25 3
26 0
27 0
28 0
29 0
30 1
Total = 230
The manager is interested in transforming these data into useful information by con- structing a frequency distribution. Table 2.7 shows one approach in which the possible val- ues for the number of movies streamed are listed from 0 to 30. Although this frequency distribution is a step forward in transforming the data into information, because of the large number of possible values, the 230 observations are spread over a large range, making analysis difficult. In this case, the manager might consider forming a grouped data fre- quency distribution by organizing the possible number of streamed movies into discrete categories or classes.
To begin constructing a grouped frequency distribution, sort the quantitative data from low to high. The sorted data are called a data array. Now, define the classes for the variable of interest. Care needs to be taken when constructing these classes to ensure each data point is put into one, and only one, possible class. Therefore, the classes should meet four criteria.
First, they must be mutually exclusive.
Second, they must be all-inclusive.
Third, if at all possible, they should be of equal width.
Fourth, if possible, classes should not be empty.
Mutually Exclusive Classes Classes that do not overlap, so that a data value can be placed in only one class.
All-Inclusive Classes
A set of classes that contains all the possible data values.
Equal-Width Classes
The distance between the lowest possible value and the highest possible value in each class is equal for all classes.
M02_GROE0383_10_GE_C02.indd 59 05/09/17 2:08 PM
60 Chapter 2
|
Graphs, Charts, and Tables—Describing Your DataEqual-width classes make analyzing and interpreting the frequency distribution easier.
However, there are some instances in which the presence of extreme high or low values makes it necessary to have an open-ended class. For example, annual family incomes in the United States are mostly between $15,000 and $200,000. However, there are some families with much higher family incomes. To best accommodate these high incomes, you might con- sider having the highest income class be “over $200,000” or “$200,000 and over” as a catch- all for the high-income families.
Empty classes are those for which there are no data values. If this occurs, it may be because you have set up classes that are too narrow.
Steps for Grouping Data into Classes There are four steps for grouping data, such as those in Table 2.6, into classes.
s t e p 1 Determine the number of groups or classes to use.
Although there is no absolute right or wrong number of classes, one rule of thumb is to have between 5 and 20 classes. Another guideline for helping you determine how many classes to use is the 2k Ú n rule, where k = the number of classes and is defined to be the smallest integer so that 2k Ú n, where n is the number of data values. For example, for n = 230, the 2k Ú n rule would suggest k = 8 classes (28 = 256 Ú 230 while 27 = 128 6 230). This latter method was chosen for our example. Our preliminary step, as specified previously, is to produce a frequency distribution from the data array as in Table 2.7. This will enhance our ability to envision the data structure and the classes.
Remember, these are only guidelines for the number of classes. There is no specific right or wrong number. In general, use fewer classes for smaller data sets;
use more classes for larger data sets. However, using too few classes tends to con- dense data too much, and information can be lost. Using too many classes spreads out the data so much that little advantage is gained over the original raw data.
s t e p 2 Establish the class width.
The minimum class width is determined by Equation 2.2.
Class Width
The distance between the lowest possible value and the highest possible value for a frequency class.
W = Largest value - Smallest value
Number of classes (2.2)
For the streamed movies data using eight classes, we get W = Largest value - Smallest value
Number of classes = 30 - 0 8 = 3.75
This means we could construct eight classes that are each 3.75 units wide to pro- vide mutually exclusive and all-inclusive classes. However, because our purpose is to make the data more understandable, we suggest that you round up to a more convenient class width, such as 4.0. If you do round the class width, always round up.
s t e p 3 Determine the class boundaries for each class.
The class boundaries determine the lowest possible value and the highest pos- sible value for each class. In the streamed movies example, if we start the first class at 0, we get the class boundaries shown in the first column of the following table. Notice the classes have been formed to be mutually exclusive and
all-inclusive.
s t e p 4 Determine the class frequency for each class.
The count for each class is known as a class frequency. As an example, the number of observations in the first class is 60.
Class Boundaries
The upper and lower values of each class.
M02_GROE0383_10_GE_C02.indd 60 05/09/17 2:08 PM
2.1 Frequency Distributions and Histograms
|
Chapter 2 61Streamed Movies Frequency
093 60
497 50
8911 47
12915 40
16919 18
20923 8
24927 6
28931 1
Total = 230
Another step we can take to help analyze the streamed movies data is to construct a relative frequency distribution, a cumulative frequency distribution, and a cumulative relative frequency distribution.
Streamed
Movies Frequency Relative
Frequency Cumulative
Frequency Cumulative Relative Frequency
093 60 0.261 60 0.261
497 50 0.217 110 0.478
8911 47 0.204 157 0.683
12915 40 0.174 197 0.857
16919 18 0.078 215 0.935
20923 8 0.035 223 0.970
24927 6 0.026 229 0.996
28931 1 0.004 230 1.000
Total = 230
The cumulative frequency distribution is shown in the “Cumulative Frequency”
column. We can then form the cumulative relative frequency distribution as shown in the “Cumulative Relative Frequency” column. The cumulative relative fre- quency distribution indicates, as an example, that 85.7% of the sample have streamed fewer than 16 movies.
Cumulative Frequency Distribution A summary of a set of data that displays the number of observations with values less than or equal to the upper limit of each of its classes.
Cumulative Relative Frequency Distribution
A summary of a set of data that displays the proportion of observations with values less than or equal to the upper limit of each of its classes.
EXAMPLE 2-3
Frequency Distribution for Continuous Variables
Emergency Response Communication Links One of the major efforts of the U.S.
Department of Homeland Security has been to improve the communication between emer- gency responders, like the police and fire departments. The communications have been ham- pered by problems involving linking divergent radio and computer systems, as well as commu- nication protocols. While most cities have recognized the problem and made efforts to solve it, Homeland Security has funded practice exercises in 72 cities of different sizes throughout the United States. The data below are the times, in seconds, it took to link the systems.
35 339 650 864 1,025 1,261
38 340 655 883 1,028 1,280
48 395 669 883 1,036 1,290
53 457 703 890 1,044 1,312
70 478 730 934 1,087 1,341
99 501 763 951 1,091 1,355
138 521 788 969 1,126 1,357
164 556 789 985 1,176 1,360
220 583 789 993 1,199 1,414
265 595 802 997 1,199 1,436
272 596 822 999 1,237 1,479
312 604 851 1,018 1,242 1,492
HOW TO DO IT (Example 2-3)
Developing Frequency Distributions for Continuous Variables
1. Determine the desired num- ber of classes or groups. One rule of thumb is to use 5 to 20 classes. The 2kÚ n rule can also be used.
2. Determine the minimum class width using
Largest value- Smallest value Number of classes Round the class width up to a more convenient value.
3. Define the class boundaries, making sure that the classes that are formed are mutually exclusive and all-inclusive.
Ideally, the classes should have equal widths and should all contain at least one observation.
4. Determine the class frequency for each class.
M02_GROE0383_10_GE_C02.indd 61 05/09/17 2:08 PM
62 Chapter 2
|
Graphs, Charts, and Tables—Describing Your DataHomeland Security wishes to construct a frequency distribution showing the times until the communication systems are linked. The frequency distribution is determined as follows:
s t e p 1 Group the data into classes.
The number of classes is arbitrary, but typically will be between 5 and 20, depending on the volume of data. In this example, we have n = 72 data items.
A common method of determining the number of classes is to use the 2k Ú n guideline. We get k = 7 classes, since 27 = 128 Ú 72 and 26 = 64 6 72.
s t e p 2 Determine the class width.
W = Largest value - Smallest value
Number of classes = 1,492 - 35
7 = 208.14291225 Note that we have rounded the class width up from the minimum required value of 208.1429 to the more convenient value of 225.
s t e p 3 Define the class boundaries.
0 and under 225
225 and under 450
450 and under 675
675 and under 900
900 and under 1,125
1,125 and under 1,350 1,350 and under 1,575
These classes are mutually exclusive, all-inclusive, and have equal widths.
s t e p 4 Determine the class frequency for each class.
Time to Link Systems
(in seconds) Frequency 0 and under 225 9
225 and under 450 6
450 and under 675 12
675 and under 900 13
900 and under 1,125 14 1,125 and under 1,350 11 1,350 and under 1,575 7
This frequency distribution shows that most cities took between 450 and 1,350 seconds (7.5 and 22.5 minutes) to link their communications systems.
TRY EXERCISE 2-5 (pg. 70)