BUSINESS AND ECONOMICS 11e
CHAPTER 1 CHAPTER 1
1.8 Ethical Guidelines for Statistical Practice
Ethical behavior is something we should strive for in all that we do. Ethical issues arise in statistics because of the important role statistics plays in the collection, analysis, presenta- tion, and interpretation of data. In a statistical study, unethical behavior can take a variety of forms including improper sampling, inappropriate analysis of the data, development of misleading graphs, use of inappropriate summary statistics, and/or a biased interpretation of the statistical results.
As you begin to do your own statistical work, we encourage you to be fair, thorough, objective, and neutral as you collect data, conduct analyses, make oral presentations, and present written reports containing information developed. As a consumer of statistics, you should also be aware of the possibility of unethical statistical behavior by others. When you see statistics in newspapers, on television, on the Internet, and so on, it is a good idea to view the information with some skepticism, always being aware of the source as well as the purpose and objectivity of the statistics provided.
The American Statistical Association, the nation’s leading professional organization for statistics and statisticians, developed the report “Ethical Guidelines for Statistical Practice”1 to help statistical practitioners make and communicate ethical decisions and assist students in learning how to perform statistical work responsibly. The report contains 67 guidelines organized into eight topic areas: Professionalism; Responsibilities to Funders, Clients, and Employers; Responsibilities in Publications and Testimony; Responsibilities to Research Subjects; Responsibilities to Research Team Colleagues; Responsibilities to Other Statisti- cians or Statistical Practitioners; Responsibilities Regarding Allegations of Misconduct;
and Responsibilities of Employers Including Organizations, Individuals, Attorneys, or Other Clients Employing Statistical Practitioners.
1American Statistical Association “Ethical Guidelines for Statistical Practice,” 1999.
One of the ethical guidelines in the professionalism area addresses the issue of running multiple tests until a desired result is obtained. Let us consider an example. In Section 1.5 we discussed a statistical study conducted by Norris Electronics involving a sample of 200 high- intensity lightbulbs manufactured with a new filament. The average lifetime for the sample, 76 hours, provided an estimate of the average lifetime for all lightbulbs produced with the new filament. However, consider this. Because Norris selected a sample of bulbs, it is reasonable to assume that another sample would have provided a different average lifetime.
Suppose Norris’s management had hoped the sample results would enable them to claim that the average lifetime for the new lightbulbs was 80 hours or more. Suppose fur- ther that Norris’s management decides to continue the study by manufacturing and testing repeated samples of 200 lightbulbs with the new filament until a sample mean of 80 hours or more is obtained. If the study is repeated enough times, a sample may eventually be obtained—by chance alone—that would provide the desired result and enable Norris to make such a claim. In this case, consumers would be misled into thinking the new product is better than it actually is. Clearly, this type of behavior is unethical and represents a gross misuse of statistics in practice.
Several ethical guidelines in the responsibilities and publications and testimony area deal with issues involving the handling of data. For instance, a statistician must account for all data considered in a study and explain the sample(s) actually used. In the Norris Electronics study the average lifetime for the 200 bulbs in the original sample is 76 hours; this is considerably less than the 80 hours or more that management hoped to obtain. Suppose now that after re- viewing the results showing a 76 hour average lifetime, Norris discards all the observations with 70 or fewer hours until burnout, allegedly because these bulbs contain imperfections caused by startup problems in the manufacturing process. After discarding these lightbulbs, the average lifetime for the remaining lightbulbs in the sample turns out to be 82 hours. Would you be suspicious of Norris’s claim that the lifetime for their lightbulbs is 82 hours?
If the Norris lightbulbs showing 70 or fewer hours until burnout were discarded to sim- ply provide an average lifetime of 82 hours, there is no question that discarding the lightbulbs with 70 or fewer hours until burnout is unethical. But, even if the discarded lightbulbs con- tain imperfections due to startup problems in the manufacturing process—and, as a result, should not have been included in the analysis—the statistician who conducted the study must account for all the data that were considered and explain how the sample actually used was obtained. To do otherwise is potentially misleading and would constitute unethical behavior on the part of both the company and the statistician.
A guideline in the shared values section of the American Statistical Association report states that statistical practitioners should avoid any tendency to slant statistical work toward predetermined outcomes. This type of unethical practice is often observed when unrepre- sentative samples are used to make claims. For instance, in many areas of the country smok- ing is not permitted in restaurants. Suppose, however, a lobbyist for the tobacco industry interviews people in restaurants where smoking is permitted in order to estimate the per- centage of people who are in favor of allowing smoking in restaurants. The sample results show that 90% of the people interviewed are in favor of allowing smoking in restaurants.
Based upon these sample results, the lobbyist claims that 90% of all people who eat in restau- rants are in favor of permitting smoking in restaurants. In this case we would argue that only sampling persons eating in restaurants that allow smoking has biased the results. If only the final results of such a study are reported, readers unfamiliar with the details of the study (i.e., that the sample was collected only in restaurants allowing smoking) can be misled.
The scope of the American Statistical Association’s report is broad and includes ethical guidelines that are appropriate not only for a statistician, but also for consumers of statistical information. We encourage you to read the report to obtain a better perspective of ethical is- sues as you continue your study of statistics and to gain the background for determining how to ensure that ethical standards are met when you start to use statistics in practice.
1.8 Ethical Guidelines for Statistical Practice 19
Summary
Statistics is the art and science of collecting, analyzing, presenting, and interpreting data.
Nearly every college student majoring in business or economics is required to take a course in statistics. We began the chapter by describing typical statistical applications for business and economics.
Data consist of the facts and figures that are collected and analyzed. Four scales of measurement used to obtain data on a particular variable include nominal, ordinal, interval, and ratio. The scale of measurement for a variable is nominal when the data are labels or names used to identify an attribute of an element. The scale is ordinal if the data demon- strate the properties of nominal data and the order or rank of the data is meaningful. The scale is interval if the data demonstrate the properties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Finally, the scale of mea- surement is ratio if the data show all the properties of interval data and the ratio of two values is meaningful.
For purposes of statistical analysis, data can be classified as categorical or quantitative.
Categorical data use labels or names to identify an attribute of each element. Categorical data use either the nominal or ordinal scale of measurement and may be nonnumeric or numeric. Quantitative data are numeric values that indicate how much or how many. Quan- titative data use either the interval or ratio scale of measurement. Ordinary arithmetic op- erations are meaningful only if the data are quantitative. Therefore, statistical computations used for quantitative data are not always appropriate for categorical data.
In Sections 1.4 and 1.5 we introduced the topics of descriptive statistics and statistical inference. Descriptive statistics are the tabular, graphical, and numerical methods used to summarize data. The process of statistical inference uses data obtained from a sample to make estimates or test hypotheses about the characteristics of a population. The last three sections of the chapter provide information on the role of computers in statistical analysis, an introduction to the relative new field of data mining, and a summary of ethical guide- lines for statistical practice.
Glossary
StatisticsThe art and science of collecting, analyzing, presenting, and interpreting data.
Data The facts and figures collected, analyzed, and summarized for presentation and interpretation.
Data setAll the data collected in a particular study.
ElementsThe entities on which data are collected.
VariableA characteristic of interest for the elements.
ObservationThe set of measurements obtained for a particular element.
Nominal scaleThe scale of measurement for a variable when the data are labels or names used to identify an attribute of an element. Nominal data may be nonnumeric or numeric.
Ordinal scaleThe scale of measurement for a variable if the data exhibit the properties of nominal data and the order or rank of the data is meaningful. Ordinal data may be nonnu- meric or numeric.
Interval scaleThe scale of measurement for a variable if the data demonstrate the proper- ties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Interval data are always numeric.
Ratio scaleThe scale of measurement for a variable if the data demonstrate all the prop- erties of interval data and the ratio of two values is meaningful. Ratio data are always numeric.
Categorical dataLabels or names used to identify an attribute of each element. Categorical data use either the nominal or ordinal scale of measurement and may be nonnumeric or numeric.
Quantitative dataNumeric values that indicate how much or how many of something.
Quantitative data are obtained using either the interval or ratio scale of measurement.
Categorical variableA variable with categorical data.
Quantitative variableA variable with quantitative data.
Cross-sectional dataData collected at the same or approximately the same point in time.
Time series dataData collected over several time periods.
Descriptive statisticsTabular, graphical, and numerical summaries of data.
PopulationThe set of all elements of interest in a particular study.
SampleA subset of the population.
CensusA survey to collect data on the entire population.
Sample surveyA survey to collect data on a sample.
Statistical inferenceThe process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population.
Data miningThe process of using procedures from statistics and computer science to ex- tract useful information from extremely large databases.
Supplementary Exercises
1. Discuss the differences between statistics as numerical facts and statistics as a discipline or field of study.
2. The U.S. Department of Energy provides fuel economy information for a variety of motor vehicles. A sample of 10 automobiles is shown in Table 1.6 (Fuel Economy website,Feb- ruary 22, 2008). Data show the size of the automobile (compact, midsize, or large), the number of cylinders in the engine, the city driving miles per gallon, the highway driving miles per gallon, and the recommended fuel (diesel, premium, or regular).
a. How many elements are in this data set?
b. How many variables are in this data set?
c. Which variables are categorical and which variables are quantitative?
d. What type of measurement scale is used for each of the variables?
3. Refer to Table 1.6.
a. What is the average miles per gallon for city driving?
b. On average, how much higher is the miles per gallon for highway driving as compared to city driving?
Supplementary Exercises 21