Analyzing Data and Statistics
Chapter 8: Analyzing Data and Statistics
1,000 to 10,000 (that would be a long time to wander the showroom). One method that eliminates the need for so many numbers is to do groupings of values.
For example, you can make groups that consist of all the numbers from 1 to 5, 6 to 10, 11 to 15, and so on. The only stipulation is that you keep the group- ings equivalent in terms of size. A good rule of thumb is to plan on having about ten different groupings of numbers (give or take a few).
Groupings are a good way to present information such as The salary amounts of your office’s employees.
The number of light bulbs sold in your store on any given day.
The number of plates of spaghetti sold in an evening at your restaurant.
To determine how to divide a set of numbers into groups, look at the range of numbers or values that you have collected. What’s the range from the highest number to the lowest? If your listing of numbers goes from a low of 3 to a high of 97, your range is 95 numbers (97 – 3 + 1; you add the 1 so that both 3 and 97 are included in the count).
After you find your range, divide that number by 10 (which, as I mention ear- lier, is a good number of groupings to have). If you divide 95 by 10, you get 9.5. In this case, the best arrangement probably would be to have ten group- ings of ten numbers.
So now you can let the first grouping be the numbers 1 through 10; the second grouping would go from 11 through 20. From there you can calculate the groupings all the way up to the last grouping going from 91 through 100. I know that the numbers you’ve collected don’t go down to 1 or up to 100, but this way the intervals are all the same size, and they cover all the numbers pretty symmetrically.
Ready for some practice in setting up a frequency distribution with groupings?
Great! Try this one out: Say that you’ve been keeping track of the number of miles driven each week by your sales associates. You want to organize them in a frequency distribution. What does your distribution table look like?
To begin, take a look at Table 8-2, which shows the mileage records of your eight associates.
Table 8-2 The Mileage Records of the Company’ s Sales Associates
Associate 1Associate 2Associate 3Associate 4Associate 5Associate 6Associate 7Associate 8 Week 1123444574893938722422349 Week 2285483533563234442433532 Week 343984257648857426755466 Week 4456599444113634331165986 Week 5898456522464533222453886 Week 6453228866856266234155442 Week 7435868456456235956762544101
Chapter 8: Analyzing Data and Statistics
Looking through the list, you can see that the numbers range from a low of 55 miles to a high of 986 miles. The range is 932 miles (986 – 55 + 1), so your best bet is to have 10 groupings starting with 0 to 99 and going up to 900 to 999.
Table 8-3 shows you the tally created and the frequency in each grouping.
Table 8-3 The Frequency Distribution of Sales-Associate Mileage
Range Tally Frequency
0 – 99 | 1
100 – 199 | | | | 4
200 – 299 |||| ||| 8
300 – 399 | | 2
400 – 499 |||| |||| |||| ||| 18
500 – 599 |||| |||| 10
600 – 699 | 1
700 – 799 | | 2
800 – 899 |||| || 7
900 – 999 | | | 3
Total 56
This frequency table is much more informative than the original listing of numbers in Table 8-2. In fact, this table is probably even more informative (and easier to set up) than listing all the numbers in order from smallest to largest or vice versa. Why? Well, for starters, the manager gets a feel for the expected or more frequent mileage numbers. Also, the highest and lowest numbers stand out more and may trigger some inquiries. Finally, the mileage information helps in planning the salesperson’s time and reimbursement projections.
Finding the Average
When you read about the average income across the country, you probably think in terms of a number that falls in the middle of the entire spectrum of incomes. In general, that’s a correct thought — the average usually is the middle or most used value. But, in fact, the average value can be determined one of three ways. In other words, the “middle” average is just one way. The average can be the mean,the median,or the mode.
The purpose of having three different ways of measuring the average is to give the best and most descriptive number for the average of a particular collection of numbers. Unfortunately, having the three choices can lead to some abuse of numbers and some misrepresentation of what’s in the collection. So, in the fol- lowing sections, I show you all three methods of finding the average, and I let you decide what works best for describing the average value in your situation.
Adding and dividing to find the mean
The mean averageis the number determined by adding all the numbers in a collection and dividing by the number of numbers that you’ve added together.
Want to see the actual formula? The mean average of the numbers a1, a2, a3, . . . anis equal to the sum of the nnumbers divided by n:
n
a1+a2+a3+g+an
Find the mean average of these numbers: 8, 6, 4, 9, 2, 2, 5, 4, 7, 3, 9, 6.
To find the mean, add the numbers and divide by 12. You divide by 12 because you have 12 numbers in the list. Here’s what the math looks like:
. 12
8 6 4 9 2 2 5 4 7 3 9 6
12 65 5
12
5 .5 417
+ + + + + + + + + + +
= =
The mean average, 5.417, isn’t on the list of numbers, but it somewhat describes the middle of the collection.
Suppose a not-so-ethical factory owner objects to the bad press that he has been receiving from his employees, who are complaining that his pay scale is much too low. When interviewed, he expresses incredulity that his employ- ees would complain. After all, the average salary for everyone at the factory is $53,500. The nine employees who are complaining claim that they’re making only $15,000. How can this be?
If the nine employees are each making $15,000, and if the average salary of the nine employees and the owner is $53,500, you can easily find the owner’s salary by solving for x, like this:
, ,
, ,
, ,
, ,
, x x x x 10 9 15 000
53 500 10
135 000
53 500 135 000 535 000
535 000 135 000 400 000
+ = + = + =
= -
=
_ i
103
Chapter 8: Analyzing Data and Statistics
If the owner is taking home $400,000, the mean average salary of the ten people is, in fact, $53,500. So the owner isn’t exactly lying, but he’s definitely using a method that doesn’t paint a true picture. This is a case of the mean average not really representing the true average or middle value. A better measure of the average salary in this case is the median or the mode, which I explain later in this chapter.
Locating the middle with the median
The medianis the middle of a specifically ordered list of numbers (to remem- ber this one, just think about a median that runs down the middle of the high- way). The median as an average or middle of a set of numbers is a great representation of the average when you have one or more outliers.An outlier is a number that’s much smaller or much larger than all the other values.
For instance, you may remember in school when that one classmate always scored 100 or 99 even when the rest of you struggled for a 70. Those curve- breakers were so annoying. A curve-breaker is sort of like an outlier. The higher-than-usual score hiked up the average for everyone. The median aver- age is designed to lessen the effect of that unusual score.
The median is the middle number in an ordered (highest to lowest or lowest to highest) list of numbers. If the list has an even number of entries (meaning there is no middle term), the median is the mean average of the middle two numbers. (Refer to the previous section if you need guidance on finding the mean average.)
Find the median of the following numbers: 8, 6, 4, 9, 2, 2, 5, 4, 7, 3, 9, 6.
First put the numbers in order from least to greatest: 2, 2, 3, 4, 4, 5, 6, 6, 7, 8, 9, 9. Because the list has 12 numbers, the 6th and 7th numbers in the list are in the middle. The two middle numbers are 5 and 6. So to find the median, you have to find the mean average of 5 and 6. You do so by adding the num- bers together and dividing by 2. You get 5.5 as the mean.
As it turns out, the mean average of the entire list is 5.417 (see the example in the previous section), so the mean and median are pretty close in value in this situation. Having two averages of the same list that are close together in value gives you more assurance that you have a decent measure of the middle.
The ages of the 15 employees at Methuselah Manufacturing are 18, 25, 37, 28, 23, 29, 31, 87, 20, 24, 30, 21, 22, 93, 19. Find both the mean and median ages and determine which is a better representation of the average age.
The mean average is the sum of the ages divided by 15:
15 .
18 25 37 19
15 507 33 8 g
+ + + + = =
As you can see, the mean average age is 33.8, which is a higher number than all but three of the actual ages.
Now find the median by putting the ages in order from lowest to highest:
18, 19, 20, 21, 22, 23, 24, 25, 28, 29, 30, 31, 37, 87, 93
You have an odd number of values, so your middle age is the eighth number in the list: 25. An average age of 25 (rather than 33.8 with the mean average) better represents most of the actual ages. The two oldest employees are so much older than the others that their ages skew the middle or average age.
Understanding how frequency affects the mode
The modeof a set of numbers is the most-frequently occurring number — it’s the one that’s listed most often. The mode is a good average value when the number occurs overwhelmingly frequently in the list. The following three sets of numbers all have a mode of 5:
A) 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 10 B) 1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 C) 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 999, 999
The mode of 5 in the first and third lists seems to make the most sense as rep- resenting the average of the list. In the middle list, however, even though 5 is technically the mode, it doesn’t seem to represent what the list of numbers is. In this case, you might want to find the average another way.
Suppose you want to show that your nonprofit foundation is kid-friendly, but you don’t want to reveal how many children are in specific families. Determine which average is the best representation of the number of children of the 43 families served by your foundation: 0, 5, 6, 0, 3, 4, 0, 4, 10, 14, 7, 0, 6, 2, 3, 0, 4, 5, 4, 8, 7, 3, 5, 3, 2, 6, 3, 3, 0, 2, 2, 6, 1, 2, 3, 2, 2, 4, 4, 4, 6, 1, 2.
First put the numbers in order. You must do this when finding the median, but it’s also helpful to have an ordered list when figuring the mean and mode.
In order, the numbers are 0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 8, 10, 14.
105
Chapter 8: Analyzing Data and Statistics
The mean is the sum of the numbers divided by 43. The sum is 158, and 158 ÷ 43 ≈3.674.
The median is the middle number, which in this case is the 22nd number. So, the median is 3.
The mode is the most frequent number, which after counting is 2 (there are eight 2s).
So, which outcome —3.674, 3, or 2 — do you think is the best representation of the list? The two outliers (10 and 14) pull the average up a bit. I’d probably go with the middle (median) of the averages — but it’s really your call on this. You can correctly say that any of the values is the average.
Factoring in Standard Deviation
The average of a listing of numbers tells you something about the numbers, but another important bit of information is the variance or standard deviation— how much the numbers in the listing deviate from the mean average (see the earlier section, “Adding and dividing to find the mean,” for more on mean aver- ages). The standard deviation is a measure of variation and can be a decent comparison between two sets of numbers — as long as the numbers have some relation to one another. After all, as with other facets of statistics, you have to be careful not to misrepresent what’s going on just because you can.
Computing the standard deviation
Standard deviation is a measure of spread.This spread has to do with how far most of the numbers are from the average. For example, you may be inter- ested not only in the average number of sales of your staff members, but also whether the sales cluster closely around that average or are much higher and much lower than the average. Is your staff pretty predictable and steady, or are they all over the place with their sales endeavors? Standard deviation is a number representing the deviation from the mean. So the greater the stan- dard deviation, the more numbers you’ll find that are farther from the mean average.
To find the standard deviation of a list of numbers, use the following formula:
s n
x n x 1
2 2
= -
!
- ^ hwhere Σx2represents the sum of all the squares of the numbers in the list, nis the number of numbers, and x represents the mean average of the numbers (which is squared in the formula before multiplying it by n).
Each of the following three lists of numbers has a mean average of 5:
A) 1, 2, 3, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 7, 8, 9 B) 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8, 8, 9 C) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9
To find the standard deviation of list A, you square each number and then add up all the squares. Because the number 5 repeats 15 times, I show that as a multiple of 12:
12 + 22 + 32 + 42 + 15(52) + 62 + 72 + 82 + 92 = 635
You know that the mean average of the list is 5, and you know that there are 23 numbers in the list. Now all you have to do is substitute the numbers into the formula:
.
s n
x n x 1 23 1 635 23 5
22 635 575
22 60 1 65
2 2
2
.
= -
-
= -
- = -
=
!
^^ h h
Using the same process on lists B and C, you get standard deviations of about 2.132 for list B and exactly 4 for list C. So the first list of numbers has the smallest deviation, and the last list of numbers has the greatest deviation.
At a local ice cream factory, the hourly wages of the employees are as follows:
4 employees earn $8.00 per hour 7 employees earn $8.50 per hour 20 employees earn $9.25 per hour 25 employees earn $10.00 per hour 4 employees earn $12.00 per hour
What’s the (mean) average wage of the employees, and what’s the standard deviation?
107
Chapter 8: Analyzing Data and Statistics
This type of problem is solved more easily with a chart or table. So take a look at Table 8-4, which I created for this scenario.
Table 8-4 Hourly Wages of Employees
Hourly Rate # Employees Rate ×# of Square Square ×# of Employees of Rate Employees
8.00 4 32 64 256
8.50 7 59.50 72.25 505.75
9.25 20 185 85.5625 1,711.25
10.00 25 250 100 2,500
12.00 4 48 144 576
Totals 60 574.5 5,549
First, to compute the mean, you need to find the sum of the hourly rate of all 60 employees. Look at the sum of the third column, 574.5, where each rate has been multiplied by the number of employees earning that rate. Divide 574.5 by 60 (the number of employees) and you get 9.575. The (mean) aver- age hourly rate is about $9.58.
To compute the standard deviation, you need the sum of the squares of all 60 wage rates. The last column has the square of each rate multiplied by the number of employees involved. The sum of the squares is 5,549. You have to multiply the number of employees, 60, by the square of the mean. 9.5752= 91.680625. Now you’re all set to put the numbers into the formula for the standard deviation:
.
. . . .
s n
x n x
1 60 1
5549 60 91 680625
59 5549 5500 8375
59
48 1625 0 816314 0 904
2 2
. .
= -
- =
- -
= -
=
!
^ h ^ hThe standard deviation is about $0.90, or 90 cents.