consider an area chart. An area chart is a good idea when the area between two lines on a line graph is meaningful. Here, the area between the average low and the average high for the month is the range of temperatures you can expect to feel on an average day. Above in Fig. 7-17, we have taken the data (kindly supplied by this same website) and created a simple area chart using Microsoft ExcelÕ.
Note that we could have easily added the record lows and highs for each month, since they would be below and above the average lows and highs. We could probably have actually used daily, instead of monthly information, keeping only the months labeled along the x-axis. While the bar chart above looks crowded and cramped with only the little bit of information it displays so poorly, the area chart is so uncluttered that we could probably add more information without either distracting or confusing our readers.
There are tons of examples of bad graphs. This particular firm is doing no worse of a job than are many, many others. We should appreciate the bad examples, and use them as cautions and tools to learn to make good graphs.
Table 7.2 Graphical elements representing statistics.
Graphical element Statistic represented Do’s and Don’ts
Title The population
or sample
Do
. be clear whether you are speaking of a population or a sample
Don’t
. leave out the title
. use a cute, but unclear name Primary axis The frequency, count, or
other value being focused on
Do
. have the scale clearly marked with proper units.
. for a ratio scale, have the axes cross at the zero point of the primary axis
Don’t
. leave the zero point off the chart for a ratio scale
. make a break in the chart between the zero point and representative values along the primary axis. (A marked break is sometimes acceptable, but introduces biased perception.) Primary axis label The variable shown
on the primary axis
Do
. include it
. define units in parentheses, for example (in thousands)
Secondary axis Shows the various values of the variable, or the most significant variable
Do
. make it continuous or separated, whichever represents the actuality of the data
. make a conscious choice of order for nominal scales
. label the values, or show a scale, whichever is appropriate
. place ordinal scales in order . show interval scales proportionally . show the zero-point of a ratio scale, either
crossing the primary axis, or clearly labeled and perhaps marked with a vertical line . show appropriate intervals
Clear labeling
There are a couple of things we can do to make our graphs very clear for our readers. One is to choose labels in the language of our reader—business language or everyday usage—and avoid statistical or engineering terms. We can create a glossary presentation terms to statistical terms in our appendix to show how we translated the statistics into usable English. Another is to make sure that our labels are a good size and in a good position in relation to the items that they identify.
Bias and nonlinear representation
Four manipulations of the Y-axis on graphs are quite common and should be avoided in most cases. They should be avoided because, even when clearly marked, they still introduce a visual bias that changes the reader’s perception
Table 7.2 Graphical elements representing statistics (Continued).
Graphical element Statistic represented Do’s and Don’ts Secondary axis
label
The variable shown on the secondary axis
Do
. include it Tertiary axis Shows a variable less
important than the one on the secondary axis.
May be shown as a third dimension, or as the variable shown by segments or clusters
Do
. on a 3-D graph, follow all the rules for a secondary axis . on another type of graph, use a
key to provide all that information
Tertiary axis label
The variable shown on the tertiary axis
Do
. include it
Key Any information that
cannot easily be shown by labels on the graph
Do
. use a key whenever it makes interpretation easier, more certain, or clearer
. proofread your key against the graph
. use consistent keys and color choices across related graphs whenever possible
of significance. In general, do not:
. Leave out the zero on a ratio scale.This makes variance or change seem more significant than it is.
. Have a break—marked or unmarked—on a ratio scale between the zero point and the values.This also makes a variance or change seem more significant than it is. When all of the values are far from zero, but close to each other, it may be necessary to have a break in the scale. If so, always mark the break clearly. Better a break than to leave out the zero. If you do this, also note in the text that the heights of the bars do not indicate relative values.
. Use nonlinear scales.Logarithmic and other nonlinear scales have their place in engineering, but not in business. The visual meaning of items on these scales is not what it appears to the untrained audience, and they can be used to deceive, even causing lines that should be parallel or divergent to converge.
. Use unfamiliar or undefined indices. An index is a ratio. In business graphs, we can use familiar ratios that our audience uses all the time.
We should avoid unfamiliar ratios. If we do include unfamiliar ratios, we must explain them carefully and define them consistently.
A more general form of statistical misrepresentation that often finds its way into graphs is the redefinition of the base of an index. We should avoid this scrupulously. Consider the following example. A company introduces a temporary 20% pay cut during difficult times. Your annual salary goes from
$50,000 to $40,000. When things improve, the company gives everyone a 20%
raise to restore their salaries. But 20% of what? A 20% raise on $40,000 is $8,000, and now you have your salary as $48,000, $2,000 below the original $50,000. Where did the $2,000 go? It was lost in the change of the base of the index. Twenty percent of $40,000 is $2,000 less than 20% of $50,000.
Cute pictures are often used to replace bar charts. The problem is that, when we do this, we create 2-dimensional or 3-dimensional images representing linear changes, which misrepresents them, making the change or variance appear much more significant than it really is. For good examples of this and other crucial cautions, seeHow to Lie with Statistics.
Too clever by half: the whens and wheres of color, cartoons, and photographs
As you become more sophisticated in preparing graphs, you may want to jazz them up a bit. After all, if your audience falls asleep during your presentation, you are not supporting a business decision very well!
However, we recommend that you focus on being relevant and clear first, and flashy a good deal later. If you come to the point that you want to make your charts fancier, we suggest you work cautiously. It is very easy to take creativity too far when showing data in a graph or chart. The colors that highlight the important features of our data rapidly become distractions when there are too many or too much. The clever combination of chart types that allow us to compare two variables may rapidly become completely con- fusing when we try to compare three. Use color and fancy types of charts to support the clarity of communications.
The same thing applies to appeals to the emotions. Charming cartoon characters can help make our point more memorable, but they can also distract. In addition, a poorly designed icon or ideogram can actually mislead. It is a classic, and misleading, error in graphing to have pictures whose size varies with some value when, actually, the height illustrates the difference in value, but the difference in area or volume gives a false impression of a larger difference than is actually there.
We will look into this further in Chapter 10 ‘‘Reporting the Results,’’
where we discuss presenting statistical information in different business contexts, including decision support and advertising.
SOMETHING EXTRA
Technology: A Plus or a Minus?
Everything we have said about flashy effects, color, and emotional appeal applies even more to special effects we can add through computer graphics and new technology. Whenever a new technology becomes available, there is a tendency, called the ‘‘gee-whiz’’ effect, to overuse it. This is not a new thing; excessively gaudy print and photographs were used where a simple paragraph would have been clearer, all the way back in the 1800s. Snazzy changing color and moving pictures hide more than they reveal. They have no place in decision support, and their place in advertising based on statistics is questionable.
In addition, the more we rely on new technology, the more risk there is that things won’t come out the way we planned. Could you present your report if the bulb blew on your data projector? What happens to your color-coded bar charts when extra people come to the meeting, and you are quickly making extra copies—on a black-and-white printer or copier? When we KISS—Keep It Simple, Sam—we also Play It Safe, Susie. (The acronym is left for the reader to develop.)
Quiz
1. Charts and graphs. . .
(a) May be useful for examining the distribution of the data (b) Communicate the results of a statistical study
(c) Both (a) and (b) (d) Neither (a) nor (b)
2. A pie chart is useful for illustrating _______.
(a) Proportions (b) Comparisons (c) Change in frequency (d) All of the above
3. The length (or height) of a bar in a bar chart indicates the _______ of each value of the variable.
(a) Probability (b) Ratio (c) Proportion (d) Frequency
4. A Pareto chart is used when the data are on a _______ scale.
(a) Ratio (b) Nominal (c) Ordinal (d) Interval
5. We can use a _______ chart by dividing the sample into sub-samples.
(a) Line graph (b) Histogram (c) Multiple bar (d) All of the above
6. The Line graph allows you to focus on the _______ from one value to the next.
(a) Distribution (b) Frequency (c) Ratio (d) Change
7. A _______ shows each subject unit on the graph.
(a) Scatter plot (b) Line graph (c) Histogram (d) Pie chart
8. Using templates or procedures when planning charts and graphs saves you. . .
(a) Time (b) Cost (c) Effort
(d) All of the above
9. The primary axis label on a graph should be. . . (a) Included
(b) Defined in units (c) Both (a) and (b) (d) Neither (a) nor (b)
10. The use of color and flashy effects in graphs should only be used to. . . (a) Add to the ‘‘gee-whiz’’ factor
(b) Provide clarity
(c) Distract from the results (d) All of the above
CHAPTER
Common Statistical Measures
We have reached the point where we are ready to talk specifically about statistical measures, one by one. Calculating statistical measures, like any other sort of calculation, means following a specific procedure. In the world of mathematics, procedures are specified using the special language of equations. We can’t avoid equations in this chapter, but at least we can say why they are necessary. (If you are uncomfortable with equations or think you need a review of the basic rules, see Appendix A for a review of basic math.)
SURVIVAL STRATEGIES
The keys to getting the most out of equations are: First, remember that the symbol for the statistical measure being defined is always alone on the left of the equals sign.
Second, everything on the right of the equals sign is just a short-hand for the rules for
173
Copyright © 2004 by The McGraw-Hill Companies, Inc. Click here for terms of use.
calculating the value of the statistical measure. The detailed version of those rules will be given in the adjoining text.
Recall that statistical measures are summaries. They are single numbers that describe some aspect or feature of a group of numbers. So, the procedures we will be using will be applied to many numbers and will end up producing one number. We will explain what feature of the group of numbers is described by each statistical measure as we go along.
Fundamental Measures
Most complex statistical measures are based on simpler ones. We calculate the simple statistics from the numerical data and then calculate the more complex statistics from the simpler ones. The two simplest and most basic statistics are the count and the ratio, which we saw earlier in Chapter 3
‘‘What Is Probability?’’ These two statistics are so simple that most statistics books don’t even talk about them. However, they are the basis for many, many other statistics, so we will discuss them here.
COUNTING: N AND DEGREES OF FREEDOM: df
Thecount, symbolized byN, is such a simple statistic that it doesn’t even have an equation. We all learned to count before we knew any other sort of arithmetic. Counting is the most basic sort of procedure in mathematics and statistics. It is also the simplest sort of measurement. When we count something in the real world, we get a number. That sort of count is a measure, not a statistic.
Counting in statistics
But what happens when we count numbers instead of things? We start with a group of numbers and we end up with a single number that describes one feature of that group, namely, how big a group it is. When we count sheep, we get a number that is a measure of the flock. When we count numbers, like the number of the weights of all the sheep in the flock, we are calculating the statistic,N. (Of course, this is a distinction without a difference. The number of sheep will be the same as the number of numbers we get when we weigh them.)
The count of all the numbers in a group of numbers is the basis for almost every other statistic. When we look at the equations for almost every other statistic, we will seeNon the right-hand side of the equals sign.
Usually, we use the symbol, N, to indicate the size of the entire sample.
When we are talking about smaller groups that are part of the sample, we usually add a subscriptto the N to show the difference. For example, if we wanted to talk about only our black sheep, we might use N¼12 as the equation for the entire flock andNb¼5 as the equation for the count of just the black sheep.
HANDY HINTS
Note that, in these first equations, we are not specifying a procedure. When there is just a number to the right of the equals sign, the equation is being used to specify the value of the variable in that particular case. Mathematicians (and statisticians) use equations for a lot of different sorts of things, and don’t always warn us when they shift gears.
CRITICAL CAUTION
In statisticaltheory, when the population is finite,Nis sometimes used to mean the size of the entire population. The symbol,n, is then used to describe the size of the sample. We won’t be doing much theory here inBusiness Statistics Demystified, but elsewhere, this can get a bit confusing.
What are degrees of freedom?
There is a very sophisticated notion in statistical theory that we will need later on when we talk about more complicated types of statistical techniques.
This notion is called thedegrees of freedom.Even though it is a difficult idea, it is calculated from a very simple statistic, the sample size, n. Equation 8-1 shows that the degrees of freedom (df) is equal to the sample size.
df¼n ð8-1Þ
Every time we observe the world, we gather information about it. In statistics, each observation contributes one unit to the sample. The sample size,n, is the number of units in the sample, and thus is a measure of how much information we have obtained about the world with that sample.
When we usenas a measure of the amount of information in our sample, we call it thedegrees of freedom.
Now, suppose we calculate a statistic from our sample. That gives us one piece of information about the world, taken from the sample. As it turns out, there is an important sense in which each number we calculate is worth the same as each number we collect. So, when we calculate a statistic from the sample, we have one less piece of information in the sample.
At first, this may seem odd. After all, we still have allNnumbers from our sample. We still know what they are. What has been lost? The answer is that we may very well want to calculate more statistics from the same sample.
How many times can we use the same N numbers, our data, to calculate statistics and still be finding out about the world, instead of just spinning our wheels? The answer is that we can calculate N statistics from N numbers before we run out of information. As we examine various statistical measures throughout this chapter, we will take a few more looks at degrees of freedom.
Degrees of freedom will also be very important when we get to Part Three and talk about statistical techniques.