Many people think that as long as the sample size is large, it doesn’t matter how the sample was selected. This is incorrect, as illustrated by the Literary Digest poll. A sample size of 2.3 million did not prevent poor results. The sample was
6Bryson, M. C. (1976), American Statistician, vol. 30, pp. 184–185.
196 Chapter 4 Gathering Data
not representative of the population, and the sample percentage of 57% who said they would vote for Landon was far from the actual population percentage of 36% who voted for him. As another example, consider trying to estimate the average grade point average (GPA) on your campus. If you select 10 students from the library, you can produce an estimate that is biased toward higher GPAs (assuming those students who go to the library are studying more than those who don’t). If you, instead, gather a larger sample of 30 students from the library, you still have an estimate that is biased toward higher GPAs. The larger sample size did not fix the problem of sampling in the library. Many Internet surveys have thousands of respondents, but a volunteer sample of thousands is not as good as a random sample, even if that random sample is much smaller. We’re almost always better off with a simple random sample of 100 people than with a volunteer sample of thousands of people.
SUMMARY: Key Parts of a Sample Survey
j Identify the population of all the subjects of interest.
j Define a sampling frame, which attempts to list all the subjects in the population.
j Use a random sampling design, implemented using random numbers, to select n subjects from the sampling frame.
j Be cautious about sampling bias due to nonrandom samples (such as volunteer samples) and sample undercoverage, response bias from subjects not giving their true response or from poorly worded questions, and nonresponse bias from refusal of subjects to participate.
In Section 4.1 we learned that experimental studies are preferable to nonex
perimental studies but are not always possible. Some types of nonexperimental studies have fewer potential pitfalls than others. For a sample survey with random sampling, we can make inferences about the population of interest. By contrast, with a study using a convenience sample, results apply only to those subjects actu- ally observed. For this reason, some researchers use the term observational study to refer only to studies that use available subjects (such as a convenience sample) and not to sample surveys that randomly select their sample.
4.15 Choosing officers A campus club consists of five officers:
president (P), vice president (V), secretary (S), treasurer (T), and activity coordinator (A). The club can select two officers to travel to New Orleans for a conference; for fairness, they decide to make the selection at random. In essence, they are choosing a simple random sample of size n = 2.
a. What are the possible samples of two officers?
b. What is the chance that a particular sample of size 2 will be drawn?
c. What is the chance that the activity coordinator will be chosen?
4.16 Simple random sample of students In Example 4, a random drawing was held to select the winners of the football tickets. Organizers randomly chose numbers, using the computer to generate the sample
4.2 Practicing the Basics
randomly. Choose another sample by using either the Random Numbers web app from the book’s website, the website random.org, or statistical software such as StatCrunch, MINITAB, JMP, SPSS or others.
(Note: In practice, a statistician would include a seed number, which allows the computer to replicate the same sample.)
4.17 Auditing accounts—app Use an app or computer program to select 10 of the 60 school district accounts described in Example 5. Explain how you did this and identify the accounts to be sampled.
4.18 Sampling from a directory A local telephone directory has 50,000 names, 100 per page for 500 pages. Explaining how you found and used random numbers, select 10 numbers to identify subjects for a simple random sample of 10 names.
Section 4.2 Good and Poor Ways to Sample 197
a. Which statistic would someone who opposes gun control prefer to quote?
b. Explain what is wrong with the wording of each of these statements.
4.24 Physical fitness and academic performance In a study by Karen Rodenroth, “A study of the relationship between physical fitness and academic performance”, conducted among students of the fourth and fifth grade in a rural Northeast Georgia elementary school, it was found that students who are more involved in physical education class are more likely to have high grades.
a. What is the population of interest for this survey?
b. Describe why this is an observational study.
c. Identify a lurking variable in this study.
4.25 Fracking The journal Energy Policy (2014, 65: 57–67) presents a survey of opinions about fracking. Hydraulic fracturing, or fracking, is the process of drilling through rock and injecting a pressurized mixture of sand, water, and chemicals that fractures the rock and releases oil and gas. There has been much debate in the media about its impact on the environment, on land owners, and on the economy. The survey involved contacting a nationally representative sample of 1960 adults in 2012. Of the 1960 people contacted, 1061 adults responded to the survey.
The study reported that those more familiar with fracking, women, and those holding egalitarian worldviews were more likely to oppose fracking.
a. Describe the population of interest for this study.
b. Explain why a census is not practical for this study.
What advantages does sampling offer?
c. Explain how nonresponse bias might be an issue in this study.
4.26 Sexual harassment on the Internet In his statistics course project, a student stated “Millions of Internet users en- gaged in online activities of sexual harassment”. This conclusion was based on a survey administrated through different social media networks, in which 2% of the re- spondents reported they had sexually harassed someone online. In such a study, explain how there could be a. Sampling bias. (Hint: Are all Internet users equally
likely to respond to the survey?)
b. Nonresponse bias, if some users refuse to participate.
c. Response bias, if some users who participate are not truthful.
4.27 Cheating spouses and bias In a survey conducted by vouchercloud.net and reported in the Palm Beach Post, it was found that of 2645 people who had participated in an extra- marital affair, the average amount spent per month on the af- fair was $444. (Source: http://www.palmbeachpost.com/news/
news/cheating-spouses-spend-444-month-affair-survey-fin/
ngbgy/?__federated=1)
a. No information was given on how the surveyed indi- viduals were selected. What type of bias could result if the 2645 individuals were not randomly selected?
How would this type of bias potentially affect the responses and the sample mean amount spent per month on the affair?
b. The concern was raised in the article that the individuals’
answers could not be validated. What type of bias could 4.19 Bias due to interviewer gender A social scientist in a
less developed country studied the effect of the gender of the interviewer in a survey administrated to male re- spondents. The survey addressed the question of women’s participation in politics. Half the subjects were surveyed by males and the remaining half by females. Results showed of the respondents surveyed by males, about 37%
favored active participation of women in politics in their country while of the respondents surveyed by females, 67% favored the active participation of women in politics.
Which type of bias does this illustrate: sampling bias, non- response bias, or response bias? Explain.
4.20 Charity walk A nonprofit organization surveyed its members for their opinion regarding an upcoming char- ity walk. The survey asked, “We are considering a shift of our preannounced location to a new one this year.
Would you be willing to walk from the new location if it meant that many more teenage suicides would be avoided?”
a. Explain why this is an example of a leading question.
b. Explain why a better way to ask this question would be,
“which of the two would you prefer as a starting point for the walk—the preannounced or the new location?”
4.21 Instructor ratings The website www.ratemyprofessors .com provides students an opportunity to view ratings for instructors at their universities. A group of students planning to register for a statistics course in the upcoming semester are trying to identify the instructors who receive the highest ratings on the site. One student decides to register for Professor Smith’s course because she has the best ratings of all statistics instructors. Another student comments:
a. The website ratings are unreliable because the ratings are from students who voluntarily visit the site to rate their instructors.
b. To obtain reliable information about Professor Smith, they would need to take a simple random sample of the 78 ratings left by students on the site and compile new overall ratings based on those in the random sample.
Which, if either, of the student’s comments are valid?
4.22 Job trends The 2013–2014 Recruiting Trends report, produced each year by Michigan State University, reports that hiring over the past year of people with a bachelor’s degree increased 7%, for people with a PhD increased 26%, and for people with an MBA decreased 25%. This was based on a voluntary poll of all employers who interacted with at least one of 300 career service centers on college campuses. The survey was answered by 6,500 employers.
a. What is the population for this survey?
b. We cannot calculate the nonresponse rate. Explain what other information is needed to calculate this rate.
c. Describe two potential sources of bias with this survey.
4.23 Gun control More than 75% of Americans answer yes when asked, “Do you favor cracking down against illegal gun sales?” but more than 75% say no when asked, “Would you favor a law giving police the power to decide who may own a firearm?”
7Study by Lynn Sanders, as reported by the Washington Post, June 26, 1995.
198 Chapter 4 Gathering Data
subscribed to the paper the longest and sends each of them a questionnaire that asks, “Given the extremely volatile performance of the stock market as of late, are you willing to invest in stocks to save for retirement?” After analyzing result from the 50 people who reply, they report that only 10% of the local citizens are willing to invest in stocks for retirement. Identify the bias that results from the following:
a. Sampling bias due to undercoverage b. Sampling bias due to the sampling design c. Nonresponse bias
d. Response bias due to the way the question was asked 4.30 Types of bias Give an example of a survey that would
suffer from
a. Sampling bias due to the sampling design b. Sampling bias due to undercoverage c. Response bias
d. Nonresponse bias result from untruthful answers? How would this type of
bias affect the responses and the sample mean amount spent per month on the affair?
4.28 Drug use by athletes In 2015, the outgoing president of the International Association of Athletics Federations (IAAF) claimed “99% of athletes are clean”(www .irishexaminer.com). However, based on a survey ad- ministered online, a sports reporter concluded that only 30% of athletics fans agreed with the above statement.
Based on his findings, he said that the IAAF president’s statement was misleading. Identify the potential bias in the sports reporter’s study that results from
a. Sampling bias due to undercoverage.
b. Sampling bias due to the sampling design.
c. Response bias.
4.29 Identify the bias A newspaper designs a survey to esti- mate the proportion of the population willing to invest in the stock market. It takes a list of the 1000 people who have
4.3 Good and Poor Ways to Experiment
Just as there are good and poor ways to gather a sample in an observational sur- vey, there are good and poor ways to conduct an experiment. First, let’s recall the definition of an experimental study from Section 4.1: We assign each subject to an experimental condition, called a treatment. We then observe the outcome on the response variable. The goal of the experiment is to investigate the association—
how the treatment affects the response. An advantage of an experimental study over a nonexperimental study is that it provides stronger evidence for causation.
In an experiment, subjects are often referred to as experimental units. This name emphasizes that the objects measured need not be human beings. They could, for example, be schools, stores, mice, or computer chips.