THREATS TO RESEARCH DESIGN VALIDITY - Handbook of Research Methods in Public Administration

Criterion validity: There are two types of criterion validity—concurrent and predictive. Con- current validity is used to question the validity of a subset of questions that are already verified by content validity. This subset may be created to save time during the actual questioning during a survey. Consider, for example, a survey given to motorists at a bridge toll booth. The motorist can bring the survey home and return it on the next pass over the bridge. However, the decision makers would like a faster and more immediate response to the survey instrument. They decide that they will have the bridge police set up a safe area past the toll booths and before the entrance to the Interstate. Police will assist surveyors in detaining random cars so that motorists can be asked the survey questions. Any anxiety caused by the police detaining the motorist is immediately relieved when the motoristfinds that he is only being detained to answer a few questions. To ultimately get their cooperation, the motorists are told that their names will be entered in a raffle for a free dinner- for-two at a local restaurant. Before this plan can be initiated, the survey planners realize that the motorists can not be detained to answer the current questionnaire. This would delay traffic, and slow down the process, limiting the number of motorists who can be questioned, and possibly incur the wrath of the detained motorist. The survey planners decide to create a significantly shorter survey instrument from the original questionnaire that will meet face and content validity questions and give them the information that they need to meet the criteria of the survey.

Predictive validity: This validity asks the question: Does the test that is being administered have some predictive relationship on some future event that can be related back to the test administered? In thefire station experiment of determining alarm response time to newly developed areas of a township, we can determine thatfire stations within a certain radius of housing developments decrease response time to alarms, whereasfire stations outside this radius increase response time. In this instance, the fire station experiment has predictive validity if we use the results of this experiment as a predictor of futurefire station placement in the community. The future placement of fire stations relates the result of the experiment back to the test, and the test can be related to the placement of thefire stations.

Construct validity: This validity relates back to general theories being tested; aptitude tests should relate to general theories of aptitude, intelligence tests should relate to general theories of intelligence, etc. For example, in the bridge repair experiment, the county engineers realize that certain heavy equipment must be utilized by mechanics hired by the county. They want to give aptitude tests for potential hires to reduce their liability during construction. The assumption is made that the engineers, or those creating the aptitude test for using heavy equipment, understand what constitutes aptitude for using heavy equipment during bridge construction. The test to measure aptitude—the construct validity—must relate back to general theories of aptitude, to measure the individual’s capacity to operate heavy equipment and not general theories of heavy equipment.

When events occur that fall outside the boundaries of the experiment that could affect the dependent variable, internal validity has been threatened by history. History is a potential problem when studies are conducted in natural settings (O’Sullivan and Rassel, 1995). History is impossible for the experimenter to control for; rather, threats to the experiment’s validity due to history need to be explained when discussing causality. History threatens validity when we can ascertain that an event, other than the independent variable, may be associated with the dependent variable. The following example illustrates threats to validity from history.

In a study of the adequacy of existingfire stations done during the course of one year, we may find that the existingfire stations were not adequate as evidenced by the number of multiple alarm fires (requiring more than one group of responders to extinguish). In this case, the relationship we are looking for is that the number of multiple alarm fires is negatively related to the number offire stations in a district. However, during the course of the year when the data was collected, the summer was exceptionally hot and there was a drought. The drought lasted for approximately five weeks; nevertheless, it was also a period when the temperature was higher than normal. Because the area encompassed large expanses of rural and undeveloped areas, numerous brush fires occurred. Due to the stage of drying that the brush was in, thefires spread rapidly and soon required a second or thirdfire station to respond.

In the above example, the results were affected by the extraneous variable history. It is impossible to control the extraneous variable—weather, and the effect that the weather had on drying and the spread ofﬁres. The study’s validity is threatened, but not totally invalid. In this case, if one explains the effect of history and that the threat to validity is actually a contingency that districts should be prepared for, the study still has merit.

Maturation—the processes within the respondents operating as a function of the passage of time per se (not speciﬁc to the particular events), including growing older, growing hungrier, growing more tired, and the like.

When changes occur naturally over a period in the groups being studied, the threat to validity is called maturation. Commonly, studying children, or any group that may go through rapid physical and social changes, affects the validity of the experiment. Typically studies of education of a cohort group may occur over a period of years. For example, a study of reading skills of children in the primary grades is undertaken. Students will be tested over a period of six years from kindergarten through grade ﬁve. Students will be testedﬁve months into the kindergarten school year and then at the end of kindergarten. Subsequently, reading skills will be tested every year at the end of the school year.

In this example, maturation is expected to occur. The question that maturation compels us to ask is, without the educators teaching reading skills (the independent variable) would we receive similar changes in the improved reading scores (the dependent variable) without the effect of the independent variable? Children grow rapidly both socially and physically, and this rapid growth, the maturation in both physical and social contexts, may have an effect on the experiment.

Testing—the effects of taking a test upon the scores of a second testing.

In an experiment where a group is given a pretest before the introduction of the independent variable, the pretest sensitizes a group to experimentation and their response to the independent variable may be attributed to the pretest and not the independent variable. The administration of the posttest, which shows the effect of the independent variable, must be reviewed in the following context: Did the pretest effect change in the dependent variable? In short, could a pretest group associate questions from the pretest to the experiment and affect the results by consciously or unconsciously taking that experiment to that end; or, as a result of the pretest and what they remember from it, i.e., the experimental or control group are ‘‘good test takers,’’ the group does better on the posttest because their test-taking abilities affect causality and not the effect of the independent variable.

A more interesting way of illustrating the effects of testing is what has become commonly known as the Hawthorne Effect. An experiment that was begun to test workplace efﬁciency at the Hawthorne Electrical Plant soon became the basis of the organizational development theories of administration.

The essence of Hawthorne was that the employees who were being tested performed better—were more efﬁcient despite workplace conditions being more and then less favorable—because they knew that they were being tested. Researchers who have tried to duplicate this experiment have been unsuccessful and have repudiated the validity of Hawthorne. To the extent that other researchers have disavowed the Hawthorne experiments based on validity, there is merit; however, to the extent that they reject Hawthorne as a lesson for organizational development, they are mistaken.

Notwithstanding Hawthorne, the following public administration example shows the effect of testing threats to validity in terms of pretest and posttest knowledge.

A city may be looking for ways to streamline trash collection and at the same time reduce personnel costs. Other cities, such as New York, found that the introduction of a‘‘two-man truck’’(as opposed to a three-man truck) reduced costs and was an effective means of collecting trash. At a city council meeting the mayor proposes the New York model as one that might work in their city. The city council, at their public meeting, decides to do a study in one of the city’s sectors. However, there was concern that the increased costs and maintenance required on the new trucks may not offset the savings in personnel costs. They decided that they would do efﬁciency and cost measurements under the current system, while awaiting an order for two two-man trucks. The local newspaper reporter, covering the council meeting, reports the results of the council meeting in the next day’s edition.

Within two weeks, efﬁciency experts are dispatched with the sector’s two trash-teams. Aware that they are being tested, and conscious of the purpose of the study, the men outdo themselves collecting the trash. When the new trucks arrive and a posttest is administered, production and efﬁciency did not improve, which was anticipated by the council, and the savings in personnel costs of one less man on the two-man trucks did not offset the cost of the new trucks and the anticipated maintenance on the vehicles.

Obviously, the fact that subjects became aware that they were to be studied and the concomitant realization that their livelihood may be threatened affected the results of the experiment. In this case, the pretest, as well as information that the groups would be tested, threatened the validity of the experiment and skewed the results.

Instrumentation—in which changes in the calibration of measuring instrument or changes in the observers or scores used may produce changes in the obtained measurements.

When changes occur in the interpretation of the dependent or independent variable, or the methodology changes during the interval from the pretest to the posttest, these changes are interpreted as threats to validity from instrumentation. It is not unusual that during the course of a social science experiment threats to the validity from instrumentation occur.

For example, during a meeting of the local school district, a principal was concerned that shortly after lunch it seemed that students participated less in activities designed to foster participation. The principal’s theory was that the lunch provided by the district was not healthy and the amount of fats and empty calories used in the diet were the major factor for this lack of participation. To illustrate his point, the principal brought with him a nutritionist who attested to the fact that the essence of the school lunch program was‘‘junk food.’’The board decided that a study should be commissioned to determine if there was a relationship between school lunches and the level of participation in school activities after lunch. The study was to encompass the school term from September to June. Contacts were made with the school district in the next county, which had more nutritionally sound meals, to act as a control group. After the study was in effect for three months the same nutritionist presented her case in front of the state senate. Shortly after her presentation, a bill was introduced, passed by the state legislature, appropriate vendors found, and statewide nutritionally sound meals were mandated in all the school districts. However, the commissioned study continued in the school

district in question and theﬁnal result of the study was that there was little correlation between the lunch meal and the level of participation.

Between the beginning of this study and the end, a change occurred—the state’s mandate that nutritionally sound meals be served—which may have affected the validity of the experiment.

Instrumentation threats to validity are common when studies examine issues that can be affected extraneously by changes in laws or the court’s interpretation of existing laws.

Statistical Regression—operating where groups have been selected on the basis of their extreme scores.

Statistical regression threatens validity when the study chooses to include in the experiment an outlier score—a score higher or lower than expected—at the time of the pretest. The expectation of such a score is that if the subject is evaluated again, the score on the next test will be substantially lower or higher than the previous test, i.e., their scores will regress toward the mean. However, if choices of subjects for the study were based on pretest outlier scores, and one does not consider statistical regression, validity is threatened. Notwithstanding the essence of validity—the test measures what we want it to measure, where those with high abilities score high, and those with low abilities score low—it would not be unusual to make errors in experimentation by not considering statistical regression.

In this example, the commissioner of human services wanted a breakdown of the department’s community mental health partial treatment centers so that a decision could be reached on closing some of the least-utilized facilities and privatizing the rest. For the last quarter, due to incidental reasons, the Fairfax Community Mental Health Center showed a decrease in admissions, substantially lower than their previous trends. The Fairfax Center had been operating for approximately ten years, and always maintained a high new-patient census. However, this decrease in new admissions was assumed to be the result of population shifts and better utilization of new treatment modalities.

Based on the low admission rate for new patients and the recent increase in new drug utilization, a decision to close Fairfax was reached. Fairfax community leaders were not in any hurry to temper the Department of Human Service’s decision as the community mental health center was a continual cause of discontent within the community. Shortly after Fairfax closed, the community witnessed an increase in the homeless population, crime, and the suicide rate.

The above is a typical example of not considering statistical regression as a threat to validity.

Fairfax Community Mental Health Center was experiencing some type of‘‘blip’’in their admission rate. The low admission rate represented an outlier score that was too low. Had the Fairfax admission rate been viewed for the ensuing three months, the rate would most likely revert or regress to Fairfax’s historical mean admission rate.

Biases—resulting in differential selection of respondents for the comparison groups.

Bias or selection is a threat to internal validity when the subjects, cases, scores, etc., are not chosen randomly. On the face of it, biases are something that we inherently avoid so as not to appear prejudiced. However, the threats to validity from bias and all other threats to internal validity can occur with or without the researcher being aware of these threats. Biases occur when we choose comparison groups that are uniformly different from each other. Our results then become affected by our biases so that the results obtained would not have been obtained if the differences between the experimental and control group were less extreme. The following example of an invalid study is one where the researcher purposely biased the experiment.

Baby formula companies have often lobbied for state infant nutrition programs to bolster product sales. In one such state, pressure from the American Academy of Pediatrics Council on Nutrition lobbied state legislators saying that such programs limit the use of breast milk as the primary choice of nutrition for babies. Seeing that this pressure from the pediatric community might limit the power base of the agency by eliminating the program, there was a need to show program success. Analysts in the Department of Health in favor of the continuation of the Infant Nutrition

Program decided to conduct a study investigating the relationship between infant morbidity and participants in the program and using an experimental control group of infants who were not participants in the Infant Nutrition Program and who were identiﬁed by the health department as those who were born of crack-addicted and HIV-positive mothers.

In the above case, the stark difference between the experimental and control group is so system- atic that this difference or selection process has replaced the independent variable—participation in the Infant Nutrition Program—with the threat to validity—bias, which would be the factor that had the ultimate effect on the dependent variable—infant morbidity.

Experimental Mortality—or differential loss of respondents from the comparison groups.

The threat to internal validity that makes the researcher more concerned with those experimental subjects who leave or drop out of the study rather than remain in the study until completion is called experimental mortality. Further, experimental mortality includes those subjects who are misplaced in the control group, i.e., those subjects who at one time before the experiment or during the course of the experiment were exposed to part of the experimental treatment, a stage or effect of the independent variable, and then incorrectly assigned to the control group. Whether a dropout or a misplaced, exposed member of the control group, the experimenter must ask if the experiment would have been any different if those who dropped out had remained, or if those who were incorrectly assigned to the control group were assigned to the experimental group. Regarding the dropouts, the researcher must not only inquire how his results would have been different, but also if there is an effect of the independent variable treatment that caused the subject to drop out. There are obvious examples of both dropouts and incorrect assignment that can be applied to any pharmaceutical test on a new drug. Dropouts can be described by a pharmaceutical experiment where the effects of the drug during the course of the experiment caused the subject to leave. In this case, the researcher must determine if an unfavorable treatment reaction affected the dropout; if that subject had stayed to the end of the experiment, how would it affect the experiment’s results; could the person have been exposed to some earlier derivative of the drug, its natural or chemical form that would have sensitized the subject to the drug?

Selection Maturation Interaction—which in certain of the multi-group quasi experimental designs . . . is confounded with . . . the effect of the experimental variable.

Selection maturation interaction is what can be described as ‘‘design contamination’’

(O’Sullivan and Rassel, 1995), ‘‘diffusion or imitation of treatments’’ (Jones, 1985), and other sobriquets. At the least, it is contamination of either the control or the experimental group that negates the effect of the experiment unless one is doing research on the effects of contamination.

Benignly, selection maturation, or contamination, is related to the threat to validity from‘‘testing’’. This occurs when the experimental groups guess the purpose of the experiment and gravitate toward that end. Malignantly, contamination occurs when one group tells another group what they are experiencing or what they believe the experiment is about, and this cross-contamination places the experiment in the validity danger zone.

For example, a long-time problem in education is the use of a testing model to evaluate teaching performance through a testing instrument given to their students. Recently, education researchers had developed a testing instrument that would eliminate 65 percent of the variance. School districts throughout the country are excited about this development. Politicians who feel that teachers are overpaid and not productive enough are eager to see the results of the experiment. The teachers’ union feels that a test of this type is just another ploy to adversely affect contract negations to their constituency. Before the test is administered, a thorough description of the examination is picked up by the press, and considering the hot political issue the test has developed into, they publish the story and various follow-up pieces. Teachers, unions, and families discussing the test with students, constituents, and children have sensitized the students to the issue. In many schools, teachers who believe they have

Dalam dokumen Handbook of Research Methods in Public Administration (Halaman 145-151)