be to select two random samples of drivers in a given state. Th e fi rst group would be required to have their automobiles inspected within 6 months (experimental group); the second group would be left alone (control group). Data would be collected regarding the past driving records of the drivers in both groups, and after 2 years, data would be collected again to encompass the period of the study.
If in this period the rate of accidents decreased in the experimental (inspection) group relative to the rate in the control group, then the researcher would have some evidence for inferring that automobile inspections reduce traffi c accidents.
Conversely, if the data failed to show this pattern, the proposed causal relation- ship would be rejected.
Quasi-experimental designs of research have been given this appellation because they fail to incorporate one or more of the features of experimental designs.
In particular, in many research designs it is diffi cult to control exposure to the experimental stimulus or independent variable. Consider a television informa- tion program funded by the federal government and intended to promote energy conservation. To examine the eff ectiveness of this program, the researcher would ideally want to study the behavior of two random samples of people: one that viewed the program and another that did not. In this situation, it would be rela- tively easy to determine whether the program led to energy conservation. In actu- ality, however, people interested in conservation will tend to watch the program, and those not interested in conservation will tend to seek other diversion—and there is little reason to assume that these two groups are random or matched in any sense. (For one thing, the groups diff er dramatically in interest in energy conservation.) Th us, although the researcher may fi nd that those who viewed the program were subsequently more likely to conserve energy than those who did not, it is not clear whether the program or the viewers’ initial attitude was responsible for the impact.
A second aspect of the classical experimental research design that is often lacking in quasi-experimental designs is repeated measurement. To evaluate whether the independent variable produces changes in the dependent variable, it is very helpful to know respondents’ scores on the variables prior to a change in the independent variable—that is, their scores on the pretest. Th en one can de- termine whether this change is accompanied by a change in the dependent vari- able. In the experiment, this function is served by measurement before and after the experimental treatment, which is intended to aff ect or change the level of the independent variable in the experimental group (for instance, auto inspections are intended to increase the safety of automobiles).
Unfortunately, repeated measurements are not always available or feasible.
For example, a researcher may be interested in the determinants of the eff ec- tiveness of an agency, but data on eff ectiveness may not have been collected until very recently, or relevant data from the past may not be comparable to current information. Or a researcher may be interested in the determinants of public opinion toward a nonprofi t agency, but funding may limit the study to a single survey of respondents, so that repeated measurement is not pos- sible. In both situations, the researcher is likely to obtain only one set of data
for one point in time. Although it may be possible with such data to observe covariations between variables (such as between the funding of agency divi- sions and their performance), establishing time order between variables is nec- essarily problematic.
In the following sections, we discuss experimental and quasi-experimen- tal designs of research in greater detail. Our primary objective is to assess the internal and external validity of these two major families of design. Within the category of the quasi-experiment, some texts further distinguish research designs, such as “descriptive” and “preexperimental” designs. Unfortunately, diff erent authors and texts defi ne these terms diff erently, and the terms are not standard.
To avoid confusion, we do not use them here but simply refer to them as mem- bers of the family of quasi-experimental designs.
Experimental Designs of Research
Internal Validity
Experimental research designs off er the strongest model of proof of causality.
Th e basic components of the classical experimental design can be summarized briefl y.
Step 1: Assign subjects to two or more groups, with at least one “experimen- tal” and one “control,” so that the groups are as comparable as possible.
Th e best way to assemble comparable groups is through random assign- ment of subjects to groups. Random means that there is no bias in the assignment so that, within the limits of statistical probability, the groups should not diff er.
Step 2: Measure all subjects on relevant variables. Although a preexperiment measurement, or pretest, is usually administered, some experimental de- signs do not require a pretest. We present some of these designs below.
Step 3: Expose the experimental group(s) to the treatment or stimulus, the independent variable. Ensure that the control group(s) is(are) not ex- posed. Exposure to the treatment should constitute the only diff erence between the groups.
Step 4: Measure the groups again on the requisite variables in a postexperiment measurement, or posttest.
Step 5: Compare the measurements of the groups. If the independent variable does lead to changes in the dependent variable, this result should be evident in pretest-posttest comparisons between the experimental and control groups. Or, if the groups are large and known to be equivalent through random assignment, the analyst can simply compare posttest scores between the two groups. If the causal inference is valid, these comparisons should bear out predicted diff erences between the experi- mental and control groups.
Th e classical experimental design is outlined in Table 3.2. In the table, O stands for observation or measurement, X for administration of the experimen- tal treatment, R for random assignment, c for control group, e for experimental group, time subscript 1 for pretest, and time subscript 2 for posttest.
Intuitively, we know that the strength of experimental research designs with respect to internal validity arises from the fact that when these experiments are conducted properly, the experimental and control groups are identical except for a single factor—exposure to the experimental treatment. Th us, if, at the conclu- sion of the experiment, the former group is signifi cantly diff erent from the latter with respect to the dependent variable, the cause must be the treatment, or the independent variable. After all, this factor was the only one that distinguished the two groups.
Th e strength of the experimental design in internal validity can be shown more formally in connection with the elements of causality discussed before.
First, consider the criterion of time order. Because the researcher controls admin- istration of the experimental stimulus, and measurements are obtained prior to and after its introduction, the time order of variables is clear. Th rough exposure to the stimulus, the level of the independent variable is fi rst altered (for subjects in the experimental group), and then, through pretest versus posttest compari- son, any resulting changes in the dependent variable are readily observed.
Second, consider the criterion of covariation. If the independent variable is the cause of the dependent variable, then subjects exposed to higher levels of the former should manifest greater change in the latter. Operationally, this means that the experimental group should show greater change in the dependent vari- able than does the control group—or, equivalently, that exposure to the experi- mental treatment covaries with change in the dependent variable. Th e analyst can establish covariation by comparing pretest and posttest scores (if both are available) or by comparing the posttest scores alone, provided that members of the experimental group and control group were assigned to the groups at random.
Third, consider the criterion of nonspuriousness. As noted earlier, if the experimental and control groups are identical, with the exception of exposure to the treatment, then observed diff erences between the groups with respect to changes in the dependent variable can reliably be attributed to the independent variable rather than to other potential causes. After all, if the experimental design is implemented correctly, the independent variable is the only diff erence between the groups.
Table 3.2 Classical Experimental Design
Group Random Assignment Observation 1 Treatment Observation 2 Comparison
Experimental Re Oe1 X Oe2 Oe2 2 Oe1
Control Rc Oc1 Oc2 Oc2 2 Oc1
You may ask why the control group should manifest any changes in the dependent variable because these subjects are denied exposure to the experimental treatment. It is in this area especially that the advantages of the control group become apparent. With the passage of time, subjects in both the experimental and control groups may show changes in the dependent variable for reasons quite unrelated to the independent variable. For example, subjects may develop bio- logically and emotionally; they may react to the fact that they are being observed or measured; they may learn of dramatic events that transpire outside the experi- mental setting that may aff ect their scores on the dependent variable. Th e general name for developments such as these that could jeopardize the causal inference is threats to internal validity. Th e three threats just delineated are the threats of
“maturation,” “reactivity,” and “history,” respectively. (Th is listing is illustrative;
refer to research design texts for a more complete inventory.)
Th e impact of these threats may be felt in any experiment, but assuming that the experimental and control groups are equated to begin with (through a procedure such as random assignment of subjects to groups), there is no reason to suspect that the threats should aff ect the groups diff erently. Hence, when the measurements of the two groups are compared, these eff ects should cancel one another. However, if the independent variable is a cause of the dependent vari- able, the eff ect of the treatment should be manifested in an additional increment of change in the dependent variable—but only in the experimental group. Th us, although both groups may exhibit change, the experimental group should dem- onstrate greater change. In this manner, the control group serves as an essential baseline for interpreting and evaluating change in the experimental group.
Care must be taken in assembling experimental and control groups that are as comparable as possible. In particular, assigning subjects to the experimental or control groups arbitrarily, or segregating them according to scores on a criterion (low achievers versus high achievers, regular voters versus occasional voters), or allowing them to volunteer for either group creates obvious selection biases that result in a priori diff erences between the groups. Consequently, if the experiment reveals a diff erence in the dependent variable between the experimental group and the control group in the end, the researcher cannot rule out the possibility that it was the initial diff erences between the groups—rather than the experimental stimulus (independent variable)—that led to this result. Because the causal infer- ence is thereby weakened, these (biased) selection procedures must be avoided.
Th e technique of choice in constructing equivalent experimental and con- trol groups is random assignment of subjects to those groups. Random assign- ment removes any systematic diff erence between the groups. Randomization is an extremely powerful technique because it controls for factors both known and unknown to the researcher. For good reason, then, random assignment is the foremost method for equating experimental and control groups. Equating is fun- damental to establishing the validity of the causal inference.
Th e manager needs to understand that random assignment does not mean haphazard or arbitrary assignment. It has a precise statistical meaning: Each sub- ject or case has an equal chance of being assigned to the experimental group or
to the control group. As a result, the groups will have very similar (if not equiva- lent) composition and characteristics, within the limits of statistical probability.
It is this quality that leads to the comparability of the experimental and control groups. If you ever need to draw a random sample for research or other purposes, consult a table of random numbers (contained in the appendices of most statis- tics texts) or an expert in the fi eld of sampling.
The final criterion of causality is theory. Unfortunately, no research design—and no statistical technique—can establish a causal relationship as substantively meaningful, credible, or important. This evaluation must be made on other grounds, such as logic, experience, and previous research.
Th e criterion of theory reinforces the adage that statistics are no substitute for substantive knowledge of a fi eld.
External Validity
Th e preceding discussion supports the conclusion that experimental designs of research are relatively strong with respect to internal validity—the causal inference based on the experimental setting and the selected sample of subjects.
However, they are not as strong with respect to external validity—the ability to generalize the results of a study to other settings, other times, and other popula- tions. Primarily two factors limit the external validity of experimental designs.
Th e fi rst is the context of these designs. To isolate subjects from extraneous variables, experimenters frequently place them in a laboratory setting. Although the laboratory works admirably in sealing off subjects from possibly confound- ing factors, it also removes them from a real-life setting, to which the researcher usually seeks to generalize results. Th us, one can question how closely the situ- ation simulated in the laboratory resembles the processes of everyday life. For example, as part of the experimental treatment, the researcher may systemati- cally expose subjects to new ideas, problems, information, or people. However, in the normal course of events, people exercise a great deal more personal choice and control regarding their exposure to and handling of these infl uences. Conse- quently, the results obtained in the experiment may hold under the experimental conditions, but their application to less artifi cial (more realistic) situations may be problematic.
Closely related to this type of diffi culty is the argument that because pre- measurement (pretesting) may sensitize subjects to the experimental treatment, results may apply only to pretested populations. Th is problem, however, is more tractable than the fi rst. If the experimental and control groups have been ran- domly assigned, then the researcher can eliminate the pretest procedure because random assignment should remove any initial diff erences between the groups;
a posttest is then suffi cient to assess the eff ect of the experimental treatment.
Th is research design is called the posttest only––control group design. Or, if time and resources allow, an additional set of experimental and control groups that are not pretested can be incorporated into the classical experimental design. With this addition, it is possible to determine not only whether the experimental stimulus has an eff ect on non-pretested samples but also the magnitude of any reactive
eff ects of premeasurement. Th is design is known as the Solomon four-group de- sign. Letting O stand for observation or measurement, X for administration of the experimental treatment, R for random assignment of subjects to groups, c for control group, e for experimental group, and p for pretest, these two experimen- tal designs can be outlined as shown in Tables 3.3 and 3.4.
Th e second factor that threatens the external validity of experimental de- signs of research is the sample of subjects on which fi ndings are based. Because of ethical and fi nancial considerations, experiments conducted on a random sample of individuals drawn from a well-defi ned population (such as U.S. citi- zens) have been rare. Institutional review boards (IRBs) are in place at most universities and research centers to evaluate ethical issues in research. They weigh the advantages and disadvantages of research carefully and probe the use of deception, which may be employed to make an experimental setting seem more realistic to participants. Sometimes the potential knowledge to be gained by the experiment is judged so vital that these objections are put aside (as with medical research on life-threatening diseases). In addition, the cost and practical problems of conducting an experiment on a random sample of subjects can be prohibitive. As a result, experiments traditionally have been conducted on “cap- tive” populations—prison inmates, hospital patients, and especially students.
Th e correspondence between these groups and more heterogeneous, “natural”
populations is necessarily problematic, thus threatening the external validity of many experiments.
It is important to note, however, that this situation is beginning to change.
As social science researchers have grown more familiar with the use and advan- tages of experimental designs, they have become attuned to naturally occurring experiments—such as changing traffi c laws and their enforcement, implementing
Table 3.3 Posttest Only-Control Group Design
Group Randomization Treatment Observation 1
Experimental Re X Oe1
Control Rc Oc1
Table 3.4 Solomon Four-Group Design
Group Randomization Observation 1 Treatment Observation 2
Experimental Rep Oep1 X Oep2
Control Rcp Ocp1 Ocp2
Experimental Re X Oe2
Control Rc Oc2
new government programs, turning over a part of nonprofi t agency operations to volunteers, contracting with a fund-raising fi rm to increase donations to an agency, and so on. With adequate foreknowledge of such developments, the sever- ity of threats to the external validity of experimental designs posed by problems of sampling and context can be attenuated signifi cantly. In the future, social science researchers will probably become increasingly adept at warding off threats to the external validity of experiments.