Developing Observational Checklists - METHODS IN EDUCATIONAL RESEARCH Y

to a more appropriate section on the survey. If no such section exists, the item should be deleted.

In some cases, it may be more appropriate to use a panel of experts to es-tablish content validity because members of the pilot may not have the expertise to critically examine the items. For example, if a researcher is doing a survey of middle-level math teachers regarding high-stakes testing and state math standards, it might be more beneficial to have a panel of math experts examine the survey, in addition to the pilot group.

• Organize the checklist so that related behaviors are grouped together. This will make it easier to record your observations.

• Decide how observations will be recorded (e.g., count of frequencies, rat-ings of intensity, duration of behavior). Provide a space to check off behaviors, record time length, or give a rating scale for the behavior.

• Define the time periods in which observations will occur (e.g., continuous observation for a specified time period, recording for specific time intervals each day or hour, random sampling of time periods).

• Train observers to use the observational measure, or if using it yourself, practice until you can record your observations quickly and accurately.

The researcher would use the checklist while observing the class and simply check off next to the particular behavior every time one of the students in the class performed one of those actions. Exhibit 5.3 presents an example of an ob-servational checklist. A teacher who is conducting an action research study might use this checklist to monitor changes in a student’s behavior during the time that an intervention is taking place.

To demonstrate that the newly developed observational checklist is a good measure, one would next need to gather evidence that the checklist is reliable and valid. For an observational measure, the most important type of reliability is con-sistency or interrater reliability between raters or scorers. Interrater reliability is the level of consistency or accuracy when two or more people are observing and recording data on the same observed scenario. For example, researcher A observes the math class and counts 33 off-task behaviors, whereas researcher B observes the exact same class at the exact same time and comes up with only 4 off-task behav-iors. The problem with such a wide range of scores is that one cannot determine which one truly reflects what the classroom looks like. For the study to have rigor

EXHIBIT 5.3 OBSERVATIONAL CHECKLIST FOR SELF-ESTEEM.

___ Child is afraid to try new things.

___ Child seems to be hopeful about the future.

___ Child gets discouraged easily.

___ Child is self-directed and initiates activities on own.

___ Child is comfortable making eye contact with others.

___ Child thinks s/he is not important or is unattractive.

or credibility, the checklist must have an acceptable level of interrater reliability.

Researchers who create an observational checklist work to establish interrater re-liability before they go out and collect data for their study. The researcher would recruit persons to serve as observers, train them in the use of the checklist, and then have them observe several classrooms using the checklist. The results from the different observers would be compared after each session. Discrepancies be-tween scorers would be worked out through an in-depth discussion bebe-tween the scorers following analysis of the piloting scores. Following these discussions, mod-ifications to improve the interrater reliability would be made before the checklist is used to collect data from the study’s sample.

In some situations, it is necessary for the researcher to show that a checklist is consistent each time the same scorer uses the instrument. This type of reliabil-ity is referred to as intrarater reliabilreliabil-ity. To assess this type of reliabilreliabil-ity, the same rater would use the instrument to score the same set of behaviors more than once. This might be done through the use of a videotape of behaviors, or if the instrument is used to rate written materials, the rater could score the materials twice at different points in time. As in test-retest reliability, you would allow some time to pass between ratings to ensure that the rater is not simply remembering previous ratings. Intrarater reliability is measured by correlating the scores for the two different ratings.

Validity of an observational checklist involves consideration of both its con-tent and its use. The items on an observational checklist might be examined by a group of experts in the areas addressed by the instrument to assess content valid-ity. For example, the experts might examine the items on the observational scale of self-esteem in Exhibit 5.3 to see if these items are consistent with current the-ories and research on self-esteem. Do all of the items clearly indicate a child’s level of self-esteem? Are there other possible interpretations of what a given behavior means? For example, could avoidance of eye contact indicate a cultural practice, such as respect for adults? The validity of a newly developed observational check-list might also be examined in research studies seeking evidence to confirm its validity. Construct validity might be examined by determining if there are corre-lations between scores obtained using this instrument and other measures thought to be related. The self-esteem scores might be correlated with teacher ratings of the students’ confidence or peer ratings of their popularity. (After a series of such studies are published, this self-developed instrument might take on the status of a preestablished instrument!)

All observational measures are susceptible to certain problems that might un-dermine their validity. Most of these problems occur because the observer who is making a judgment and recording a score might distort the accuracy of what is observed.

Self-Developed Measures and Qualitative Measurement 115

Observer bias occurs when the observer’s background, expectations, or per-sonal perceptions influence the observation, making it inaccurate. If you are ob-serving students asking questions in a seventh-grade science classroom, your own beliefs about the interests and abilities of males and females in science might in-fluence what you record. You might overlook or categorize questions asked by girls differently from those asked by boys. A similar problem with accuracy in observa-tions is contamination, which occurs when the observer’s knowledge of the study affects his or her observations. If the observer knows that the researcher expects boys to ask higher order questions, then she or he might be more likely to notice and record this type of question for boys. A well-developed checklist should help avoid errors by clearly describing the different types of questions to be recorded.

Other ways to control observer bias are by training observers in the use of the checklist before the study (or by practicing the use of it yourself). Your accuracy in recording could be checked by videotaping the observations and asking a second observer to also rate the behaviors. Contamination is often controlled by keeping observers “blind” (or in the dark) about the expected outcomes of the study.

A final problem that can occur with observational measures is known as the halo effect. The halo effect occurs when an initial impression influences all sub-sequent observations, making them less accurate. If in your first set of observa-tions, Eager Edgar impresses you with his incisive questions and Cautious Caren only makes some simple and tentative comments, your later observations might reflect these first impressions. There might be a tendency to record more high-level questions for Edgar and fewer for Caren, even if Edgar’s question asking de-clines and Caren’s increases in sophistication.

Developing Measurement Procedures

Dalam dokumen METHODS IN EDUCATIONAL RESEARCH Y (Halaman 142-145)