Today, businesses collect massive amounts of data they hope will be useful for making decisions. Every time a customer makes a purchase at a store like Macy’s or the Gap, data from that transaction are updated to the store’s database. Major retail stores like Walmart capture the number of different product categories included in each “market basket” of items purchased. Table 2.1 shows these data for all customer transactions for one morning at one Walmart store in Dallas. A total of 450 customers made purchases on the morning in question. The first value in Table 2.1 is a 4, which indicates that the customer’s purchase included four different product categories (for example food, sporting goods, photography supplies, and dry goods).
Although the data in Table 2.1 are easy to capture with the technology of today’s cash registers, in this form, the data provide little or no information that managers could use to determine the buying habits of their customers. However, these data can be converted into useful information through descriptive statistical analysis.
Frequency Distributions
A more effective way to display the Dallas Walmart data would be to construct a frequency distribution.
The product data in Table 2.1 take on only a few possible values (1, 2, 3, c, 11). The minimum number of product categories is 1 and the maximum number of categories in these data is 11. These data are called discrete data.
When you encounter discrete data, where the variable of interest can take on only a rea- sonably small number of possible values, a frequency distribution is constructed by counting the number of times each possible value occurs in the data set. We organize these counts into the frequency distribution table shown in Table 2.2. From this frequency distribution we are able to see how the data values are spread over the different number of possible product
2.1
o u tc o m e 1 Frequency Distribution
A summary of a set of data that displays the number of observations in each of the distribution’s distinct categories or classes.
Discrete Data
Data that can take on a countable number of possible values.
M02_GROE0383_10_GE_C02.indd 53 05/09/17 2:08 PM
54 Chapter 2
|
Graphs, Charts, and Tables—Describing Your Datacategories. For instance, you can see that the most frequently occurring number of product categories in a customer’s “market basket” is 4, which occurred 92 times. You can also see that the three most common numbers of product categories are 4, 5, and 6. Only a very few times do customers purchase 10 or 11 product categories in their trip to the store.
TABLE 2.1 Product Categories per Customer at the Dallas Walmart
4 2 5 8 8 10 1 4 8 3 4 1 1 3 4
1 4 4 5 4 4 4 9 5 4 4 10 7 11 4
10 2 6 7 10 5 4 6 4 6 2 3 2 4 5
5 4 11 1 4 1 9 2 4 6 6 7 6 2 3
6 5 3 4 5 6 5 3 10 6 5 7 7 4 3
8 2 2 6 5 11 9 9 5 5 6 5 3 1 7
6 6 5 3 8 4 3 3 4 4 4 7 6 4 9
1 6 5 5 4 4 7 5 6 6 9 5 6 10 4
7 5 8 4 4 7 4 6 6 4 4 2 10 4 5
4 11 8 7 9 5 6 4 2 8 4 2 6 6 6
6 4 6 5 7 1 6 9 1 5 9 10 5 5 10
5 4 7 5 7 6 9 5 3 2 1 5 5 5 5
5 9 5 3 2 5 7 2 4 6 4 4 4 4 4
6 5 8 5 5 5 5 5 2 5 5 6 4 6 5
5 7 10 2 2 6 8 3 1 3 5 6 3 3 6
5 4 5 3 3 7 9 4 4 5 10 6 10 5 9
4 3 8 7 1 8 4 3 1 3 6 7 5 5 5
4 7 4 11 6 6 3 7 9 4 4 2 9 7 5
1 6 6 8 3 8 4 4 1 9 3 9 3 4 2
9 5 5 7 10 5 3 4 7 7 6 2 2 4 4
4 7 3 5 4 9 2 3 4 3 2 1 6 4 6
1 8 1 4 3 5 5 10 4 4 4 6 9 2 7
9 4 5 3 6 5 5 3 4 6 5 7 3 6 8
3 6 1 5 7 7 5 4 6 6 6 3 6 9 5
4 5 10 1 5 5 7 8 9 1 6 5 6 6 4
10 6 5 5 5 1 6 5 6 4 7 9 10 2 6
4 4 6 11 9 5 4 4 3 5 4 6 2 6 7
3 5 6 7 4 5 4 6 9 4 3 3 6 9 4
3 7 5 6 11 4 4 8 4 2 8 2 4 2 3
6 5 1 10 5 9 5 4 5 1 4 9 5 4 4
TABLE 2.2 Dallas Walmart Product Categories Frequency Distribution
Number of Product
Categories Frequency
1 25
2 29
3 42
4 92
5 86
6 68
7 35
8 19
9 29
10 18
11 7
Total = 450
M02_GROE0383_10_GE_C02.indd 54 05/09/17 2:08 PM
2.1 Frequency Distributions and Histograms
|
Chapter 2 55TABLE 2.3 Frequency Distributions of Years of College Education
Philadelphia Knoxville
Years of
College Frequency Years of
College Frequency
0 35 0 187
1 21 1 62
2 24 2 34
3 22 3 19
4 31 4 14
5 13 5 7
6 6 6 3
7 5 7 4
8 3 8 0
Total = 160 Total = 330
Consider another example in which a consulting firm surveyed random samples of resi- dents in two cities, Philadelphia and Knoxville. The firm is investigating the labor markets in these two communities for a client that is thinking of relocating its corporate offices to one of the two locations. Education level of the workforce in the two cities is a key factor in making the relocation decision. The consulting firm surveyed 160 randomly selected adults in Philadelphia and 330 adults in Knoxville and recorded the number of years of college attended. The responses ranged from zero to eight years. Table 2.3 shows the frequency dis- tributions for the two cities.
Suppose now we wished to compare the distribution for years of college for Philadelphia and Knoxville. How do the two cities’ distributions compare? Do you see any difficulties in making this comparison? Because the surveys contained different numbers of people, it is dif- ficult to compare the frequency distributions directly. When the number of total observations differs, comparisons are easier to make if relative frequencies are computed. Equation 2.1 is used to compute the relative frequencies.
Table 2.4 shows the relative frequencies for each city’s distribution. This makes a com- parison of the two much easier. We see that Knoxville has relatively more people without any college (56.7%) or with one year of college (18.8%) than Philadelphia (21.9% and 13.1%). At all other levels of education, Philadelphia has relatively more people than Knoxville.
Relative Frequency
The proportion of total observations that are in a given category. Relative frequency is computed by dividing the frequency in a category by the total number of observations. The relative frequencies can be converted to percentages by multiplying by 100.
TABLE 2.4 Relative Frequency Distributions of Years of College
Philadelphia Knoxville
Years of
College Frequency Relative
Frequency Frequency Relative Frequency
0 35 35>160 = 0.219 187 187>330 = 0.567
1 21 21>160 = 0.131 62 62>330 = 0.188
2 24 24>160 = 0.150 34 34>330 = 0.103
3 22 22>160 = 0.138 19 19>330 = 0.058
4 31 31>160 = 0.194 14 14>330 = 0.042
5 13 13>160 = 0.081 7 7>330 = 0.021
6 6 6>160 = 0.038 3 3>330 = 0.009
7 5 5>160 = 0.031 4 4>330 = 0.012
8 3 3>160 = 0.019 0 0>330 = 0.000
Total 160 330
M02_GROE0383_10_GE_C02.indd 55 05/09/17 2:08 PM
56 Chapter 2
|
Graphs, Charts, and Tables—Describing Your DataEXAMPLE 2-1
Frequency and Relative Frequency Distributions
Real Estate Transactions In late 2008, the United States experienced a major eco- nomic decline thought to be due in part to the sub-prime loans that many lending institu- tions made during the previous few years. When the housing bubble burst, many institu- tions experienced severe problems. As a result, lenders became much more conservative in granting home loans, which in turn made buying and selling homes more challenging.
To demonstrate the magnitude of the problem in Kansas City, a survey of 16 real estate agencies was conducted to collect data on the number of real estate transactions closed in December 2008. The following data were observed:
3 0 0 1
1 2 2 0
0 2 1 0
2 1 4 2
The real estate analysts can use the following steps to construct a frequency distribution and a relative frequency distribution for the number of real estate transactions.
s t e p 1 List the possible values.
The possible values for the discrete variable, listed in order, are 0, 1, 2, 3, 4.
s t e p 2 Count the number of occurrences at each value.
The frequency distribution follows:
Transactions Frequency Relative Frequency
0 5 5/16 = 0.3125
1 4 4/16 = 0.2500
2 5 5/16 = 0.3125
3 1 1/16 = 0.0625
4 1 1/16 = 0.0625
Total = 16 1.0000
s t e p 3 Determine the relative frequencies.
The relative frequencies are determined by dividing each frequency by 16, as shown in the right-hand column above. Thus, just over 31% of those responding reported no transactions during December 2008.
TRY EXERCISE 2-1 (pg. 70) HOW TO DO IT (Example 2-1)
Developing Frequency and Relative Frequency Distribu- tions for Discrete Data 1. List all possible values of the variable. If the variable is ordinal level or higher, order the possible values from low to high.
2. Count the number of occur- rences at each value of the vari- able and place this value in a column labeled “Frequency.”
To develop a relative frequency distribution, do the following:
3. Use Equation 2.1 and divide each frequency count by the total number of observations and place in a column headed
“Relative Frequency.”
TABLE 2.5 TV Source Frequency Distribution
TV Source Frequency
DISH 80
DIRECTV 90
Cable 20
Other 10
Total = 200
Relative Frequency
Relative frequency5 f n
i (2.1)
where:
fi = Frequency of the ith value of the discrete variable n = a
k i=1
fi = Total number of observations
k = The number of different values for the discrete variable
The frequency distributions shown in Table 2.2 and Table 2.3 were developed from quan- titative data. That is, the variable of interest was numerical (number of product categories or number of years of college). However, a frequency distribution can also be developed when the data are qualitative data, or nonnumerical data. For instance, if a survey asked homeown- ers how they get their TV signal, the possible responses in this region are:
DISH DIRECTV Cable Other
Table 2.5 to the left shows the frequency distribution from a survey of 200 homeowners.
M02_GROE0383_10_GE_C02.indd 56 05/09/17 2:08 PM
2.1 Frequency Distributions and Histograms
|
Chapter 2 57EXAMPLE 2-2
Frequency Distribution for Qualitative Data
Automobile Accidents State Farm Insurance recently surveyed a sample of the records for 15 policy holders to determine the make of the vehicle driven by the eldest member in the household. The following data reflect the results for 15 of the respondents:
Ford Dodge Toyota Ford Buick
Chevy Toyota Nissan Ford Chevy
Ford Toyota Chevy BMW Honda
The frequency distribution for this qualitative variable is found as follows:
s t e p 1 List the possible values.
For these sample data, the possible values for the variable are BMW, Buick, Chevy, Dodge, Ford, Honda, Nissan, Toyota.
s t e p 2 Count the number of occurrences at each value.
The frequency distribution is
Car Company Frequency
BMW 1
Buick 1
Chevy 3
Dodge 1
Ford 4
Honda 1
Nissan 1
Toyota 3
Total = 15
TRY EXERCISE 2-7 (pg. 71)
BUSINESS APPLICATION
Frequency Distributions
Athletic Shoe Survey In recent years, a status symbol for many students has been the brand and style of athletic shoes they wear. Companies such as Nike and Adidas compete for the top position in the sport shoe market. A survey was conducted in which 100 college students at a southern state school were asked a number of questions, including how many pairs of Nike shoes they currently own. The data are in a file called SportsShoes.
The variable Number of Nike is a discrete quantitative variable. Figure 2.1 shows the fre- quency distribution (output from Excel) for the number of Nike shoes owned by those sur- veyed. The frequency distribution shows that, although a few people own more than six pairs of Nike shoes, most of those surveyed own two or fewer pairs.
Excel Tutorial