TABLES OF RANDOM NUMBERS 8.5 RANDOM ASSIGNMENT OF SUBJECTS

Populations, Samples, and Probability

8.4 TABLES OF RANDOM NUMBERS 8.5 RANDOM ASSIGNMENT OF SUBJECTS

8.6 SURVEYS OR EXPERIMENTS?

P R O B A B I L I T Y 8.7 DEFINITION 8.8 ADDITION RULE 8.9 MULTIPLICATION RULE 8.10 PROBABILITY AND STATISTICS

Summary / Important Terms / Key Equations / Review Questions

Preview

In everyday life, we regularly generalize from limited sets of observations. One sip indicates that the batch of soup is too salty; dipping a toe in the swimming pool reassures us before taking the first plunge; a test drive triggers suspicions that the used car is not what it was advertised to be; and a casual encounter with a stranger stimulates fantasies about a deeper relationship. Valid generalizations in inferential statistics require either random sampling in the case of surveys or random assignment in the case of experiments. Introduced in this chapter, tables of random numbers can be used as aids to random sampling or random assignment.

Conclusions that we’ll encounter in inferential statistics, such as “95 percent confident” or “significant at the .05 level,” are statements based on probabilities.

We’ll define probability for a simple event and then discuss two rules for finding probabilities of more complex outcomes, including (in Review Question 8.18 on page 165) the probability of the catastrophic failure of the Challenger shuttle in 1986, which took the lives of seven astronauts.

8

8 . 1 P O P U L AT I O N S 1 4 9

P O P U L A T I O N S A N D S A M P L E S

Generalizations can backfire if a sample misrepresents the population. Faced with the possibility of erroneous generalizations, you might prefer to bypass the uncertainties of inferential statistics by surveying an entire population. This is often done if the size of the population is small. For instance, you calculate your GPA from all of your course grades, not just from a sample. If the size of the population is large, however, complete surveys are often prohibitively expensive and sometimes impossible. Under these circumstances, you might have to use samples and risk the possibility of erroneous generalizations. For instance, you might have to use a sample to estimate the mean annual income for parents of all students at a large university.

8 . 1 P O P U L AT I O N S

Any complete set of observations (or potential observations) may be characterized as a population. Accurate descriptions of populations specify the nature of the observations to be taken. For example, a population might be described as “attitudes toward abortion of currently enrolled students at Bucknell University” or as “SAT critical reading scores of currently enrolled students at Rutgers University.”

R e a l P o p u l a t i o n s

Pollsters, such as the Gallup Organization, deal with real populations. A real population is one in which all potential observations are accessible at the time of sampling.

Examples of real populations include the two described in the previous paragraph, as well as the ages of all visitors to Disneyland on a given day, the ethnic backgrounds of all current employees of the U.S. Postal Department, and presidential preferences of all currently registered voters in the United States. Incidentally, federal law requires that a complete survey be taken every 10 years of the real population of all U.S. households—at considerable expense, involving thousands of data collectors—as a means of revising election districts for the House of Representatives. (An estimated undercount of millions of people, particularly minorities, in both the 2000 and 2010 censuses has revived a suggestion, long endorsed by statisticians, that the entire U.S. population could be estimated more accurately if a highly trained group of data collectors focused only on a random sample of households.)

Population

Any complete set of observations (or potential observations).

I

N T E R N E T

S

^{I T E}

Go to the website for this book (http://www.wiley.com\college\witte). Click on the Student Companion Site, then Internet Sites, and finally U.S. Census Bureau to view its website, including links to its many reports and to population clocks that show current population estimates for the United States and the world.

WWW

H y p o t h e t i c a l P o p u l a t i o n s

Insofar as research workers concern themselves with populations, they often invoke the notion of a hypothetical population. A hypothetical population is one in which all potential observations are not accessible at the time of sampling. In most experiments,

subjects are selected from very small, uninspiring real populations: the lab rats housed in the local animal colony or student volunteers from general psychology classes.

Experimental subjects often are viewed, nevertheless, as a sample from a much larger hypothetical population, loosely described as “the scores of all similar animal subjects (or student volunteers) who could conceivably undergo the present experiment.”

According to the rules of inferential statistics, generalizations should be made only to real populations that, in fact, have been sampled. Generalizations to hypothetical populations should be viewed, therefore, as provisional conclusions based on the wisdom of the researcher rather than on any logical or statistical necessity. In effect, it’s an open question—often answered only by additional experimentation—

whether or not a given experimental finding merits the generality assigned to it by the researcher.

8 . 2 S A M P L E S

Any subset of observations from a population may be characterized as a sample. In typical applications of inferential statistics, the sample size is small relative to the population size. For example, less than 1 percent of all U.S. worksites are included in the Bureau of Labor Statistics’ monthly survey to estimate the rate of unemploy- ment. And although, only 1475 likely voters had been sampled in the final poll for the 2012 presidential election by the NBC News/Wall Street Journal, it correctly predicted that Obama would be the slim winner of the popular vote (http://www.wsj .com/election 2012).

O p t i m a l S a m p l e S i z e

There is no simple rule of thumb for determining the best or optimal sample size for any particular situation. Often sample sizes are in the hundreds or even the thousands for surveys, but they are less than 100 for most experiments. Optimal sample size depends on the answers to a number of questions, including “What is the estimated variability among observations?” and “What is an acceptable amount of error in our conclusion?” Once these types of questions have been answered, with the aid of guide- lines such as those discussed in Section 11.11, specific procedures can be followed to determine the optimal sample size for any situation.

Progress Check * 8.1 For each of the following pairs, indicate with a Yes or No whether the relationship between the first and second expressions could describe that between a sample and its population, respectively.

(a) students in the last row; students in class (b) citizens of Wyoming; citizens of New York

(c) 20 lab rats in an experiment; all lab rats, similar to those used, that could undergo the same experiment

(d) all U.S. presidents; all registered Republicans (e) two tosses of a coin; all possible tosses of a coin

Progress Check * 8.2 Identify all of the expressions from Progress Check 8.1 that involve a hypothetical population.

Answers on page 429.

Sample

Any subset of observations from a population.

8 . 4 TA B L E S O F R A N D O M N U M B E R S 1 5 1

8 . 3 R A N D O M S A M P L I N G

The valid use of techniques from inferential statistics requires that samples be random.

Random sampling occurs if, at each stage of sampling, the selection process guarantees that all potential observations in the population have an equal chance of being included in the sample.

It’s important to note that randomness describes the selection process—that is, the conditions under which the sample is taken—and not the particular pattern of observations in the sample. Having established that sampling is random, you still can’t predict anything about the unique pattern of observations in that sample. The observations in the sample should be representative of those in the population, but there is no guarantee that they actually will be.

C a s u a l o r H a p h a z a r d , N o t R a n d o m

A casual or haphazard sample doesn’t qualify as a random sample. Not every student at UC San Diego has an equal chance of being sampled if, for instance, a pollster casually selects only students who enter the student union. Obviously excluded from this sample are all those students (few, we hope) who never enter the student union.

Even the final selection of students from among those who do enter the student union might reflect the pollster’s various biases, such as an unconscious preference for attrac- tive students who are walking alone.

Progress Check * 8.3 Indicate whether each of the following statements is True or False.

A random selection of 10 playing cards from a deck of 52 cards implies that

(a) the random sample of 10 cards accurately represents the important features of the whole deck.

(b) each card in the deck has an equal chance of being selected.

(c) it is impossible to get 10 cards from the same suit (for example, 10 hearts).

(d) any outcome, however unlikely, is possible.

Answers on page 429.

8 . 4 TA B L E S O F R A N D O M N U M B E R S

Tables of random numbers can be used to obtain a random sample. These tables are generated by a computer designed to equalize the occurrence of any one of the 10 digits: 0, 1, 2, . . . , 8, 9. For convenience, many random number tables are spaced in columns of five-digit numbers. Table H in Appendix C shows a specimen page of random numbers from a book devoted entirely to random digits.

H o w M a n y D i g i t s ?

The size of the population determines whether you deal with numbers having one, two, three, or more digits. The only requirement is that you have at least as many different numbers as you have potential observations within the population. For example, if you were attempting to take a random sample from a population consisting of 679 students at some college, you could use the 1000 three-digit numbers ranging

Random Sampling

A selection process that guarantees all potential observations in the population have an equal chance of being selected.

from 000 to 999. In this case, you could identify each of the potential observations, as represented by a particular student’s name, with a single number. For instance, if a student directory were available, the first person, Alice Aakins, might be assigned the three-digit number 001, and so on through to the last person in the directory, Zachary Ziegler, who might be assigned 679.

U s i n g T a b l e s

Enter the random number table at some arbitrarily determined place. Ordinarily this should be determined haphazardly. Open a book of random numbers to any page and begin with the number closest to a blind pencil stab. For illustrative purposes, however, let’s use the upper-left-hand corner of the specimen page (Table H, Appendix C) as our entry point. (Ignore the column of numbers that identify the various rows.) Read in a consistent direction—for instance, from left to right. Then as each row is used up, shift down to the start of the next row and repeat the entire process. As a given number between 001 and 679 is encountered, the person identified with that number is included in the random sample.

Since the first number on the specimen page in Table H is 100 (disregard the fourth and fifth digits in each five-digit number), the person identified with that number is included in the sample. The next three-digit number, 325, identifies the second person.

Ignore the next number, 765, since none of the numbers between 680 and 999 is identified with any names in the student directory. Also, ignore repeat appearances of any number between 001 and 679. The next three-digit number, 135, identifies the third person. Continue this process until the specified sample size has been achieved.

E f f i c i e n t U s e o f T a b l e s

The inefficiency of the previous procedure becomes apparent when a random sample must be obtained from a large population, such as that defined by a city telephone directory. It would be most laborious to assign a different number to each name in the directory prior to consulting the table of random numbers. Instead, most investigators refer directly to the random number table, using each random number as a guide to a particular name in the directory. For example, a six-digit random number, such as 239421, identifies the name on page 239 (the first three digits) and line 421 (the last three digits). This process is repeated for a series of six-digit random numbers until the required number of names has been sampled.

Progress Check *8.4 Describe how you would use the table of random numbers to take (a) a random sample of five statistics students in a classroom where each of nine rows con-

sists of nine seats.

(b) a random sample of size 40 from a large directory consisting of 3041 pages, with 480 lines per page.

Answers on page 429.

A C o m p l i c a t i o n : N o P o p u l a t i o n D i r e c t o r y

Lacking the convenience of an existing population directory, investigators resort to variations on the previous procedure. For instance, the Gallup Organization makes a separate presidential survey in each of the four geographical areas of the United States: Northeast, South, Midwest, and West. Within each of these areas, a series of random selections culminates in the identification of particular election precincts:

small geographical districts with a single polling place. Once household directories have been obtained for each of these precincts, households are randomly selected and pre- designated household members are interviewed.

8 . 5 R A N D O M A S S I G N M E N T O F S U B J E C T S 1 5 3

Many pollsters use random digit dialing in an effort to give each telephone number—whether landline or wireless—in the United States an equal chance of being called for an interview. Essentially, the first six digits of a 10-digit phone number, including the area code, are randomly selected from tens of thousands of telephone exchanges, while the final four digits are taken directly from random numbers.

Although random digit dialing ensures that all unlisted telephone numbers will be sampled, it has lost some of its appeal recently because of a federal prohibition against its use to contact wireless numbers and also because of the excessively high nonre- sponse rates, often as high as 91 percent. In an effort to approximate a more representative sample, pollsters have been exploring other techniques, such as online polling.*

(http://www.stat.columbia.edu/~gelman/research/published/forecasting-with- nonrepresentative-polls.pdf).

H y p o t h e t i c a l P o p u l a t i o n s

As has been noted, the researcher, unlike the pollster, usually deals with hypothetical populations. Unfortunately, it is impossible to take random samples from hypothetical populations. All potential observations cannot have an equal chance of being included in the sample if, in fact, some observations are not accessible at the time of sampling. It is a common practice, nonetheless, for researchers to treat samples from hypothetical populations as if they were random samples and to analyze sample results with techniques from inferential statistics. Our adoption of this practice—to provide a common basis for discussing both surveys and experiments—is less troublesome than you might think inasmuch as random assignment replaces random sampling in well- designed experiments.

8 . 5 R A N D O M A S S I G N M E N T O F S U B J E C T S

Typically, experiments evaluate an independent variable by focusing on a treatment group and a control group. Although subjects in experiments can’t be selected randomly from any real population, they can be assigned randomly, that is, with equal likelihood, to these two groups. This procedure has a number of desirable consequences:

■ Since random assignment or chance determines the membership for each group, all possible configurations of subjects are equally likely. This provides a basis for calculating the chances of observing any specific difference between groups and ultimately deciding whether, for instance, the one observed mean difference between groups is real or merely transitory.

■ Random assignment generates groups of subjects that, except for random differences, are similar with respect to any uncontrolled variables at the outset of the experiment.

For instance, to determine whether a study-skill workshop improves academic performance, volunteer subjects should be assigned randomly either to the treatment group (attendance at the workshop) or to the control group. This ensures that, except for random differences, both groups are similar initially with respect to any uncontrolled variables, such as academic preparation, motivation, IQ, etc. At the conclusion of such an experiment, therefore, any observed differences in academic performance between these two groups, not attributable to random differences, would provide the most clear-cut evidence of a cause-effect relationship between the independent variable (attendance at the workshop) and the dependent variable (academic performance).

*See introductory comments in http://dx.doi.org/10.1016/j.ijforcast.2014.06.001.

Random Assignment A procedure designed to ensure that each subject has an equal chance of being assigned to any group in an experiment.

H o w t o A s s i g n S u b j e c t s

The random assignment of subjects can be accomplished in a number of ways. For instance, as each new subject arrives to participate in the experiment, a flip of a coin can decide whether that subject should be assigned to the treatment group (if heads turns up) or the control group (if tails turn up). An even better procedure, because it eliminates any biases of a live coin tosser, relies on tables of random numbers. Once the tables have been entered at some arbitrary point, they can be consulted, much like a string of coin tosses, to determine whether each new subject should be assigned to the treatment group (if, for instance, the random number is odd) or to the control group (if the random number is even).

C r e a t i n g E q u a l G r o u p s

Equal numbers of subjects should be assigned to the treatment and control groups for a variety of reasons, including the increased likelihood of detecting any difference between the two groups. To achieve this goal, the random assignment should involve pairs of subjects. If the table of random numbers assigns the first volunteer to the treatment group, the second volunteer should be assigned automatically to the control group. If the random numbers assign the third volunteer to the control group, the fourth volunteer should be assigned automatically to the treatment group, and so forth. This procedure guarantees that at any stage of the random assignment, equal numbers of subjects will be assigned to the two groups.

M o r e E x t e n s i v e S e t s o f R a n d o m N u m b e r s

Incidentally, the page of random numbers in Table H, Appendix C, serves only as a specimen. For serious applications, refer to a more extensive collection of random numbers, such as that in the book by the Rand Corporation cited on page 470 of Appendix C. If you have access to a computer, you might refer to the list of random numbers that can be generated, almost effortlessly, by computers.

Progress Check * 8.5 Assume that 12 subjects arrive, one at a time, to participate in an experiment. Use random numbers to assign these subjects in equal numbers to group A and group B. Specifically, random numbers should be used to identify the first subject as either A or B, the second subject as either A or B, and so forth, until all subjects have been identified.

There should be six subjects identified with A and six with B.

(a) Formulate an acceptable rule for single-digit random numbers. Incorporate into this rule a procedure that will ensure equal numbers of subjects in the two groups. Check your answer in Appendix B before proceeding.

(b) Reading from left to right in the top row of the random number page (Table H, Appendix C), use the random digits of each random number in conjunction with your assignment rule to determine whether the first subject is A or B, and so forth. List the assignment for each subject.

Answers on pages 429 and 430.

8 . 6 S U R V E Y S O R E X P E R I M E N T S ?

When using random numbers, it’s important to have a general perspective. Are you engaged in a survey (because subjects have been sampled from a real population) or in an experiment (because subjects have been assigned to various groups)? In the case of surveys, the object is to obtain a random sample from some real population.

Reminder:

Random sampling occurs in well-designed surveys, and ran- dom assignment occurs in well- designed experiments.

Dalam dokumen A Legacy of Knowledge and Understanding (Halaman 166-179)