Just as we must plan the type of physical storage and location, we must also plan for the logical form in which the data are stored. There are many ways to ensure the safety and accessibility of our data, and we need to use them.
Data should be encoded in a way that will allow either us or the computer to
detect any errors that may occur during storage. The technology of error detection and error correction is far beyond the scope of this book, but we need to know that our system has such features and that they are turned on and working. If data need to be analyzed in a different form than that in which it was collected, both forms must be stored. When data are cleaned or edited to correct errors, the earlier ‘‘dirtier’’ versions must be archived in order to keep an audit trail. Redundancy is vital in data storage as well as in data collection. In addition, the audit trail should include an appropriate record of the processes used to clean up data, and it may be possible to link those process records to the data records in a well-designed database.
The single most important aspect of quality database design is never to throw any data away. Data should be recorded as they are observed and/or collected. Never, ever summarize data before recording them in the database.
CASE STUDY
The Case of the Sinister Statistician
The first author worked for a while assisting graduate students in doing the statistics for their research. One Master’s student decided to give all her subjects a test that had ten or twenty questions. The score for each test was only three numbers, calculated from the questions. The student decided to score each test by hand and then record only the scores and not the individual answers to the individual questions. The first author practically got down on his knees and begged the student to enter all the answers and use the computer to calculate the scores.
The student explained her logic: She would have to score each test anyway. The scoring procedure was very easy to do by hand. By doing it this way, she only had to type in three numbers for each subject, rather than ten or twenty! (The study had fewer than 30 subjects.) She had already gotten her statistical analysis plan approved by her Master’s committee and was not required to do any analysis of the original numbers, only on the three scores.
The first author (still on his knees, now practically weeping) pointed out that:
Scoring by hand tremendously increased the possibility of calculation error. There weren’t that many subjects. The author knew many tricks to make data entry easier and less prone to error. The author was willing to help. There would only be one copy of the original data (on the paper test forms) and it might get lost. What if the committee changed its mind?
The student, now entirely convinced of the author’s sinister intent, proceeded to score the tests by hand. After her preliminary oral examination, one of the faculty members on the committee decided that a more detailed statistical analysis was needed based upon the individual test questions. The student had to return to the lab and spend half the night entering data in order to graduate that semester. By her own estimate, she more than doubled her own workload by scoring the tests by hand.
The moral to the story is this: The authors ofBusiness Statistics Demystifiedare not sinister. We are not trying to make things harder on anyone. We have learned from years of our own experience and our own mistakes how to lessen the likelihood of making extra work for ourselves. It all boils down to this: Planning pays off, and it is better to get it right the first time. We are just trying to pass that information along.
Experimental and Quasi-Experimental Data
The first and most important thing to know is that all of the rules that apply to collecting survey data (above) also apply to data derived from experiments or quasi-experiments. As we will see in Part Three, the biggest difference between collecting survey data and collecting experimental or quasi- experimental data is that, in the latter case, we collect data sampled from more than one population. So all the rules for collecting data from a single population still apply. In addition, it is critical to store the information that identifies the population each subject unit comes from in a way that is as safe and error-free as possible.
In addition, for many purposes, in experimental studies, the information that identifies the population must be kept secret until data collection is complete. (This is called a double-blind study and ensures that the psychological predispositions of the people being studied, the people conducting the study, and the people collecting the data do not create bias and error in the results.) Population information may also be kept secret for privacy reasons. Often, the information that identifies the population from where the subject comes, together with their other data, is enough to reveal their identity. In these two sorts of cases, the population information must not only be stored safely, it must also be encoded and stored separately from the rest of the data and there must be a way to reconnect the two parts of the data when necessary.
In experimental studies, the methods used for the type of sampling may also be used to determine the population. For instance, in addition to random sampling, subjects may be randomly assigned to groups. (The group determines from which population the subject is assumed to come from.) In this case, every step in the random sampling and assignment procedures must be stored as well, in order to ensure an audit trail that reflects how each subject came from its respective population.
In quasi-experiments, there are a variety of methods used where a true control group, or a truly random assignment of members is not possible.
In some cases, there may still be multiple populations, and proper procedures for managing data and statistics from multiple populations apply.
Quiz
1. What is a disadvantage of using someone else’s summary statistics?
(a) We cannot double-check the calculations (b) We cannot calculate any new statistics (c) Both (a) and (b) are disadvantages (d) Neither (a) nor (b) are disadvantages 2. Free statistics. . .
(a) Is always of the highest quality (b) Can be found everywhere
(c) Is always worth what we pay for them (d) Always have the data available
3. _______ is the best insurance we can have of the quality of free data?
(a) Data collection documentation (b) The assurance of the vendor (c) The source of any bias (d) Archived data
4. Non-archived data is data that is collected. . . (a) For the purposes of statistical studies (b) For reasons other than for statistical studies
(c) By a vendor for the purposes of selling you the data (d) With the highest quality standards
5. What source(s) within our own company may be the source of data?
(a) Financial records
(b) Human Resource records (c) Computer data logs (d) All of the above
6. Dealing with missing data, erroneous records, and correcting recording errors in the data is. . .
(a) Cleaning data (b) Coding data (c) Falsifying data (d) Analyzing data
7. The method for ensuring reliability in data entry is. . . (a) Data collection
(b) Data redundancy (c) Data validation (d) Data calibration
8. The single most important aspect of database design is to. . . (a) Summarize the data as much as possible
(b) Never throw any data away
(c) Make good decisions about what data to record and what to throw away
(d) Perform error correction
9. How does experimental and quasi-experimental data collection differ from survey data collection?
(a) Data is sampled from more than one population (b) The rules of collecting survey data do not apply (c) Both (a) and (b)
(d) There is no difference
10. A quasi-experiment occurs when. . . (a) A true control group is not possible (b) Random assignment is not possible
(c) Either random assignment or a true control group is not possible (d) Both random assignment and a true control group are not possible
CHAPTER
Statistics Without Numbers: Graphs and Charts
There are two critical uses for graphics in statistics: first, in order to perform statistical analyses, the distribution of the data must be examined visually, not only to ensure that the required assumptions for that particular statistical technique are met, but also in order to note any unusual characteristics of the data that might affect our interpretation of the results of the analyses.
Second, graphics plays a pivotal role in communicating the results in support of the business decision, either in the written report or oral presentation.
Some types of graphs are not particularly useful for the first purpose of examining a distribution, but all types of graphs have their role in com- municating results to an audience. This chapter shows you the sorts of graphs most useful for business presentations, indicates some of the uses of each, and
140
Copyright © 2004 by The McGraw-Hill Companies, Inc. Click here for terms of use.
gives guidelines for using graphs and charts to make a statistical report for business.
TIPS ON TERMS
Is it a Chart or is it a Graph?The terms ‘‘chart’’ and ‘‘graph’’ tend to be used more or less interchangeably, without a clear distinction between the two. In Business Statistics Demystified,chartandgraphare two words for the same thing. We follow common usage, so we talk about pie charts and line graphs, rather than pie graphs and line charts.
When to Use Pictures: Clarity and Precision
Before we choose what type of graph or chart to use, we need to decide whether or not to use a chart or graph or other graphical element at all.
There are really three ways we can tell our audience about quantitative (numerical) information. We can leave it as numbers (and put those numbers in a table), rephrase it as text, or convert it to a chart or graph.
Our goal is clarity, and graphics convey lots of information clearly. The decision as to how to present numerical information should be based, first and foremost, on how much information we need to present. If we can present all of the key information in a chart or graph, that should be our first choice. Even if there is additional information that is not as important as the key information, we should present the key information in a graph, and place the remainder of the less important information in a table, possibly stashing the table itself in an appendix.
KEY POINT Graphs and Charts Support Business Decisions
Use graphics to support the business decision. Then, in your report, if you need to explain no more than six or seven numbers, put them in your text. If you need to show more numbers than that, use a table. And if you need to show all of your data, put the tables in the appendix.Business Statistics Demystified, itself, is a very good model of how to present information in figures, tables, and appendices.
Often, we think of graphics as improving clarity at the cost of a loss of precision. This is not really true. If our graph provides us with all the
precision we need, then there is no effective loss of precision from changing from numbers to graphics. In addition, a well-designed graph can deliver a good deal more precision than we would imagine at the outset. Finally, even if the graph delivers less precision than we need, we can supplement the graph with a numerical table.
HANDY HINTS
A Picture and a Thousand Words
Graphs and words together make the most effective business presentation. And you don’t need a thousand words. One to three well-written paragraphs per chart is about right. We’ve filled this chapter with a variety of examples of explanations of business points illustrated by graphs. As you read, learn to explain graphs, as well as to create them.
It is essential to use the right type of graph or chart. The wrong type of chart can easily make things much less clear and confuse the reader. In this chapter, you will learn which charts to use for different types of data and different presentation purposes.
CRITICAL CAUTION
Take Time to Learn to Create Good Graphs
Please read through this chapter to learn about the uses of different types of graphs.
At the end, you will find a section calledDo’s and Don’ts. Read it carefully, or you may end up creating confusing or misleading graphs. Then grab a pen and paper and do a quick sketch of what you want your graph to look like. At that point, you will be ready to use—or learn to use—Microsoft ExcelÕor a statistical program that will generate graphs from numbers.
Parts is Parts: The Pie Chart
The pie chart is a dramatic way to show the proportions, that is, the ratios of several types that, together, constitute a whole. For example, the entire pie can represent our flock of sheep, with each pie slice representing one breed, as in Fig. 7-1. We could do another pie chart to show the proportions of groups of a different attribute, such as color. But it is essential that a pie chart divides
a whole into parts where each part is a different value of a single variable. So we would do one pie chart for breed, and another one for color. In business, a very common use of pie charts is for budgeting. The whole budget is the pie, and the slices can either represent different sources of revenue, or different allocations (but not both in the same chart).
HANDY HINTS Requirements for a Pie Chart
. There must be an identifiable whole whose parts need to be identified . Every single individual unit that is included in the whole must be uniquely
identifiable as belonging to one and only one of the parts
. There must be a single, reliably measured variable that measures the propor- tion of the whole for each part
. The sum of the measures of the parts cannot exceed 100% of the whole . If the sum of the largest and most important parts is less than 100% of the
whole, the remainder of the parts must be grouped together to form a special category of ‘‘other.’’ It must be possible to treat ‘‘other’’ category as a part, in a sensible and meaningful fashion. The ‘‘other’’ part is sometimes called
‘‘miscellaneous,’’ and should be small. Some hold that it must be smaller than any other part.
As a general rule, the pie chart is best used for displaying counts of the different values of a single categorical variable as proportions of the whole.
Because, in a pie chart, there is no set order to the values, it is used more often for nominal than ordinal variables. If the order of the values is important, another type of chart (perhaps a bar chart) should be used. If the comparison of values to each other is more important than comparison of each value to the whole, a bar chart should also be used.
Fig. 7-1. Pie chart: breeds of sheep in our flock.
HANDY HINTS Serving up a Good Pie
When presenting a pie chart, be sure to define what the whole is. In statistics, be very clear whether the whole is the population or the sample. If presenting multiple pie charts, be sure to define what the whole is, and what the group of divisions are, for each chart.
Compare and Contrast: The Bar Chart
One of the most common types of chart is the bar chart. In a basic bar chart, there is one bar for each value of the variable being illustrated. The length (or height) of the bar indicates the count, called thefrequencyof each value of the variable. The bar chart allows us to look at the sample distribution of a single variable, but it also has other uses. There are many variations on the bar chart. Some have their own special names, others do not. Those that have been named may have more than one name, each derived from different fields. The bar chart can also be combined with other types of charts. In this section, we will take a look at some of the more common and more useful types of bar charts, and try to keep track of their names.