Just like riding a bike, it’s easier to learn statistics if you are actively involved, learning by doing. One way is to practice by using software or calculators with data files or data summaries that we’ll provide. Another way is to perform ac- tivities that illustrate the ideas of statistics by using interactive web apps. We’ll use them throughout the text. You can find these apps online at the book’s website www.pearsonglobaleditions.com/agresti. Because they simply run in your browser (no installation necessary), we call them web apps. For example, you can use a web app to take samples from artificial populations and analyze them to discover properties of statistical methods applied to those samples.
This is a type of simulation—using a computer to mimic what would actually happen if you selected a sample and used statistics in real life. So, let’s get started with your active involvement.
48 Chapter 1 Statistics: The Art and Science of Learning from Data
1.20 Data file for friends Construct (by hand) a data file of the form of Figure 1.2, for two characteristics with a sample of four of your friends. One characteristic should take numerical values, and the other should take values that are categories.
1.21 Shopping sales data file Construct a data file describing the purchasing behavior of the five people, described be- low, who visit a shopping mall. Enter purchase amounts each spent on clothes, sporting goods, books, and music CDs as the data. Customer 1 spent $49 on clothes and
$16 on music CDs, customer 4 spent $92 on books, and the other three customers did not buy anything.
1.3 Practicing the Basics
1.22 Sample with caution According to a recent survey con- ducted by the Pew Research Center, it was found that, in 2014, 82% of all American internet users between the ages of 18 to 29 used Facebook. Why is it not safe to infer anything from this survey about:
a. the proportion of the general population of all American internet users who use Facebook?
b. the proportion of the general population of Americans who use Facebook?
1.23 Create a data file with software Your instructor will show you how to create data files by using the software for your course. Use it to create the data file you con- structed by hand in Exercise 1.20 or 1.21.
mFigure 1.3 Using the Sampling Distribution for the Sample Proportion Web App. This app, accessible from the book’s website, simulates taking samples from a population with population proportion 0.50 (set in first menu, see highlighted portion).
The first plot shows that a proportion of 0.5 of the subjects in the entire population vote Republican (generically labeled as “Failure”) and 0.5 vote Democrat (“Success”). The second plot shows the result from taking a sample of size 10 (set in the second menu, see highlighted portion) taken from this population. Of the 10 subjects sampled, we see that 7 voted Republican and 3 voted Democrat, for a sample proportion of 0.3 voting Democrat (indicated by the blue triangle in the second plot). Other features of this app will be explained in Chapter 7.
Try Exercises 1.25 and 1.26 b
Chapter Summary 49
a. For a population proportion of 0.50, simulate a random sample of size 1000. What is the sample proportion of successes? Do this 10 times, keeping track of the 10 sample proportions.
b. Find the approximate margin of error for a sample proportion based on 1000 observations.
c. Using the margin of error found in part b and the 10 sample proportions found in part a, form 10 intervals of believable values for the true proportion. How many of these intervals captured the actual population proportion, 0.50?
d. Collect the 10 intervals from each member of the class. (If there are 20 students, 200 intervals will be collected.) What percentage of these intervals cap- tured the actual population proportion, 0.50?
1.27 Ebola outbreaks Ebola virus disease outbreaks have a case fatality rate of 90% (meaning 90% of people who get it die). In a hospital that is treating 20 patients with ebola, 14 died. Is this result surprising? To reason an answer, use the web app (or other software) as described in Activity 2 to conduct at least 10 simulations of taking samples of size 20 from a population with a proportion 0.90. Note the sam- ple proportions. Do you observe any sample proportions equal to or less than 14/20, or 0.70?
1.24 Use a data file with software You may need to learn how to open a data file from the book’s website or down- load one from the Web for use with the software for your course. Do this for the “FL student survey” data file on the book’s website, from the survey mentioned following Figure 1.2.
1.25 Simulate with the Sampling Distribution for the Sample Proportion web app Refer to Activity 2 on page 47.
a. Repeat the activity using a population proportion of 0.60: Take at least five samples of size 10 each; observe how the sample proportions of successes vary around 0.60 and then do the same thing with at least five sam- ples of size 1000 each.
b. In part a, what seems to be the effect of the sample size on the amount by which sample proportions tend to vary around the population proportion, 0.60?
c. What is the practical implication of the effect of the sample size summarized in part b with respect to making inferences about the population proportion when you collect data and observe only the sample proportion?
1.26 Margin of error Refer to the Sampling Distribution of the Sample Proportion web app used in Activity 2.
ChaPTEr Summary
j Statistics consists of methods for conducting research stud- ies and for analyzing and interpreting the data produced by those studies. Statistics is the art and science of learning from data.
j The first part of the statistical process for answering a statisti- cal question involves design—planning an investigative study to obtain relevant data for answering the statistical question.
The design often involves taking a sample from a population where the population contains all the subjects (usually, people) of interest. Summary measures of samples are called statis- tics; summary measures of populations are called parameters.
After we’ve collected the data, there are two types of statistical analyses:
Descriptive statistics summarize the sample data with num- bers and graphs.
Inferential statistics make decisions and predictions about the entire population, based on the information in the sam- ple data.
j With random sampling, each subject in the population has the same chance of being in the sample. This is desirable because then the sample tends to be a good reflection of the population. Randomization is also important for good experimental design, for example, randomly assigning who
gets the medicine and who gets the placebo in a medical study.
j The measurements we make of a characteristic vary from individual to individual. Likewise, results of descriptive and inferential statistics vary, depending on the sample chosen.
We’ll see that the study of variability is a key part of statistics.
Simulation investigations generate many samples randomly, often using an app. They provide a way of learning about the impact of randomness and variability from sample to sample.
j The margin of error is a measure of variability of a statistic from one random sample to the next. For proportions, the margin of error is approximated by 1>1(n) * 100%, where n is the sample size.
j Results of a study are considered statistically significant if they would rarely be observed with only ordinary random variation.
j The calculations for data analysis can use computer software.
The data are organized in a data file. This file has a separate row of data for each subject and a separate column for each characteristic. However, you’ll need a good background in statistics to understand which statistical method to use and how to interpret and make valid conclusions from the computer output.
Chapter Review
50 Chapter 1 Statistics: The Art and Science of Learning from Data
ChaPTEr ProblEmS
Practicing the Basics
1.28 UW Student survey In a University of Wisconsin (UW) study about alcohol abuse among students, 100 of the 40,858 members of the student body in Madison were sam- pled and asked to complete a questionnaire. One question asked was, “On how many days in the past week did you consume at least one alcoholic drink?”
a. Identify the population and the sample.
b. For the 40,858 students at UW, one characteristic of interest was the percentage who would respond
“zero” to this question. For the 100 students sampled, suppose 29% gave this response. Does this mean that 29% of the entire population of UW students would make this response? Explain.
c. Is the numerical summary of 29% a sample statistic or a population parameter?
1.29 Euthanasia The General Social Survey asked, in 2012, whether you would commit suicide if you had an incurable disease. Of the 3112 people who had an opinion about this, 1862, or 59.8%, would commit suicide.
a. Describe the population of interest.
b. Explain how the sample data are summarized using descriptive statistics.
c. For what population parameter might we want to make an inference?
1.30 Mobile data costs A study is conducted by the
Australian Communications and Media Authority. Based on a small sample of 19 mobile communications plans of- fered, the average cost per 1000 MB of free monthly mo- bile data allowance is found to be $5.40, with a margin of error of $2.16. Explain how this margin of error provides an inferential statistical analysis.
1.31 Breaking down Brown versus Whitman Example 2 of this chapter discusses an exit poll taken during the 2010 California gubernatorial election. The administrators of the poll also collected demographic data, which allows for further breakdown of the 3889 voters from whom in- formation was collected. Of the 1633 voters registered as Democrats, 91% voted for Brown, with a margin of error of 1.4%. Of the 1206 voters registered as Republicans, 10% voted for Brown, with a margin of error of 1.7%.
And of the 1050 Independent voters, 42% voted for Brown, with a margin of error of 3.0%.
a. Do these results summarize sample data or popula- tion data?
b. Identify a descriptive aspect of the results.
c. Identify an inferential aspect of the results.
1.32 Online learning Your university is interested in deter- mining the proportion of students who would be interested in completing summer courses online, compared to on campus. A survey is taken of 100 students who intend to take summer courses.
a. Identify the sample and the population.
b. For the study, explain the purpose of using (i) descrip- tive statistics and (ii) inferential statistics.
1.33 Marketing study For the marketing study about sales in Example 5, identify the (a) sample and population and (b) descriptive and inferential aspects.
1.34 Support of labor unions The Gallup organization has asked opinions about support of labor unions since its first poll in 1936, when 72% of the American population approved of them. In its 2014 poll, it found that support of labor unions had fallen to 53% of Americans, based on a sample of 1,540 adults.
a. Calculate an estimated margin of error for these data.
b. What is the range of likely values for Americans who support labor unions in 2014?
c. This analysis is an example of i. descriptive statistics ii. inferential statistics iii. a data file
iv. designing a study
1.35 Multiple choice: Use of inferential statistics? Inferential statistics are used
a. to describe whether a sample has more females or males.
b. to reduce a data file to easily understood summaries.
c. to make predictions about populations by using sample data.
d. when we can’t use statistical software to analyze data.
e. to predict the sample data we will get when we know the population.
1.36 True or false? In a particular study, you could use de- scriptive statistics, or you could use inferential statistics, but you would rarely need to use both.
Concepts and Investigations
1.37 Statistics in the news Pick up a recent issue of a national newspaper, such as The New York Times or USA Today, or consult a news website, such as msnbc.com or cnn.com.
Identify an article that used statistical methods. Did it use descriptive statistics, inferential statistics, or both? Explain.
1.38 What is statistics? On a final exam that one of us recently gave, students were asked, “How would you define ‘statis- tics’ to someone who has never taken a statistics course?”
One student wrote, “You want to know the answer to some question. There’s no answer in the back of a book.
You collect some data. Statistics is the body of procedures that helps you analyze the data to figure out the answer and how sure you can be about it.” Pick a question that interests you and explain how you might be able to use sta- tistics to investigate the answer.
1.39 Surprising suicide data? In Exercise 1.29, of 3112 peo- ple who responded, 59.8% or 1862 people, said they would commit suicide if they had an incurable disease.
Suppose that 50% of the entire population shares this view about suicide. If the sample of 3112 people were a random sample, would this sample proportion of 0.598 be
a questionnaire like the one that follows. Alternatively, your instructor may ask you to use a data file of this type already prepared with a class of students at the University of Florida, the “FL student survey” data file on the book’s website. Using a spreadsheet program or the statistical software the instructor has chosen for your course, create a data file containing this informa- tion. What are some questions you might ask about these data? Homework exercises in each chapter will use these data.