Developing Measurement Tools for Action Research

Action researchers develop and use a wide variety of quantitative and qualitative methods to collect data. All of these methods have as their goal making data col-lection more systematic. However, because of the intense demands of taking on two roles, a researcher and practitioner, methods of data collection in action re-search must be simple and must fit into the normal processes of the school or class-room. Most action researchers will not follow the complex procedures involved in formally establishing reliability and validity, although we advise students to con-sider how they might demonstrate consistency and accuracy in their quantitative measures. Instead, action researchers typically collect data using several brief mea-sures (rating scales, observational checklists, interviews). The data from these dif-ferent measures are then compared for consistency and accuracy. Below we discuss some of the measures found most frequently in action research studies.

Checklists and Rating Scales

Action researchers frequently use checklists and rating scales as part of their teach-ing and their research. Teachers might design a checklist or ratteach-ing scale for the strategies or skills that they hope to see in their students’ writing and record in-formation on the skills that they see along with the dates or assignments on which the skills were shown. Exhibit 5.6 presents part of a checklist used at one school to record writing strategies of students.

Checklists might also be designed for students to record their own skills and activities. For example, Exhibit 5.7 displays a checklist that students might use to report on a cooperative learning activity. Students might be asked to report indi-vidually on their experiences, or the group might report on all of its members using the checklist

Simple rating scales might be designed to assess student reactions to assign-ments. For young children, these rating scales might even use nonverbal responses such as the one in Figure 5.1.

Checklists for action research might also be open-ended, with categories iden-tified and space for recording information about student strengths and weaknesses.

Grading Rubrics

As teachers, many of you may be familiar with grading rubrics. Rubrics are another type of framework for providing standardization to an observational process. In the case of rubrics, the data that are being collected are often qualitative in nature (e.g., portfolios, papers, drawings, projects), but a quantitative judgment

Self-Developed Measures and Qualitative Measurement 127

EXHIBIT 5.6 RATING SCALE FOR ENGLISH OR LITERACY SKILLS.

Raider Open Door Academy English/Literacy Skills Checklist Grades 5 – 8

Grade Introduced Progressing Proficient Mastery

Level Writing Competencies 0 1 2 3

5,6,7,8 Write to reflect personal ideas 7,8 Write to reflect multicultural

ideas

7,8 Write to reflect universal ideas 5,6 Construct simple outlines 7,8 Use story clusters

7,8 Use webs

5,6,7,8 Choose appropriate prewriting strategies 5,6,7,8 Apply appropriate

prewriting strategies 5,6,7,8 Develop a first draft that

focuses on a central idea 5,6,7,8 Revise writing based on

student-teacher collaboration 5,6,7,8 Edit writing using resources

to correct spelling, punctuation, grammar, and usage

5,6,7,8 Write with developmentally appropriate sentence structure (parts of speech, fragments, run-ons, etc.)

5,6,7,8 Use a dictionary 5,6,7,8 Use a thesaurus 5,6 Write complete simple

sentences

5,6 Write complete compound sentences

5,6 Write complete complex sentences

Source: Adapted from Raider Open Door Academy Web site. Retrieved May 12, 2005, from http://

nettleton.crsc.k12.ar.us/~ncovey/Literacy%20Skills%20Checklist.htm.

needs to be made regarding the data. This may be a numerical rating or letter grade. The rubric is essentially a set of goals or standardized criteria that are listed along with the various levels of attainment the student or portfolio might achieve:

for example, high, medium, or low or achieved, developing, and beginning. Pre-sented in Exhibit 5.8 is an example of a rubric that a faculty member might use to assess a student’s final portfolio.

In this case, let us say that the faculty member graded the portfolio using the rubric and gave it a rating of outstanding. Notice that the rubric includes an ex-planation of the criteria used to obtain the rating. If the instructor was to grade the exact same portfolio again using the rubric a week later and obtain the same rating or close to the same rating a second time, the consistency in obtaining the same score time after time would show that intrarater reliability is present. Keep in mind that an instructor would not keep scoring portfolios over and over again after determining that intrarater reliability existed. If an instructor has not es-tablished intrarater reliability for an instrument, the researcher would want to make sure that such reliability has been established before collecting any data. To do this, the researcher would select a random sample of portfolios and score them using the rubric. Then, after some time has passed, the researcher would read and

Self-Developed Measures and Qualitative Measurement 129

EXHIBIT 5.7 CHECKLIST FOR COOPERATIVE LEARNING ACTIVITY.

Listened Used Cooperated Followed Contributed Reached Name to others quiet voice with others directions ideas consensus

Source: Adapted from On-Line Learning Center (2004). Retrieved May 12, 2005, from http://olc.spsd.

sk.ca/DE/PD/instr/strats/coop/lesson.pdf.

How did you feel about the book assignment?

FIGURE 5.1 SIMPLE RATING SCALE FOR ACTION RESEARCH STUDY.

EXHIBIT 5.8 RUBRIC FOR GRADING A PORTFOLIO.

Evaluation Criteria Comments

Outstanding Evidence of depth and breadth of content is clear.

Activities demonstrate confidence and security in knowledge of subject matter.

Integration of content knowledge across disciplines is evident.

Activities reflect high degree of academic rigor as appropriate to grade or developmental level(s).

Content is connected in meaningful ways to student’s life experiences.

Originality and creativity in assimilation of content are displayed.

Content (contents, skills, process, and generalizations) is clearly articulated and meaningful.

Knowledge of NYSED Learning Standards is evident.

The ability to teach content from multiple perspectives is clear.

Acceptable Evidence of content and knowledge of subject are evi-ent but depth and breadth may need further research.

Relationships of content tied to other disciplines are logical and adequately communicated.

Areas beyond the domain of individual content mastery, clear evidence of processes, and procedures for strengthening mastery are indicated.

Knowledge of NYSED Learning Standards is evident.

Attempts to include multiple perspectives are evident.

Needs revisions Evidence of mastery exists, but it is presented in an unclear manner.

Unclear relationship of content mastery to activities and assessment measures is evident.

Weak areas of content mastery have been identified, and evidence is provided that these are being addressed.

score the same portfolios again and compare the two separate scoring for each proposal to determine consistency.

Along with reliability, validity for the rubric should also be established to en-sure that what the rubric purports to meaen-sure is truly what it is measuring. For ex-ample, if the research proposal is made up of eight different sections or components, and the rubric is assessing only five of these, then the rubric’s sam-pling validity may be an issue. To address this, the researcher would include the three outstanding sections. For item validity, the researcher would examine the eight sections of the rubric individually to ensure that each section was specifi-cally measuring what it intended to measure. For example, let us say that the fifth component of the rubric assessed the proposal’s sampling section. To establish item validity, the researcher would want to carefully examine how the rubric dis-criminated between those proposals for which the section scored at the highest level and those that scored at the medium and lower levels for that section. In ex-amination of this, the researcher would also want to ensure that the criteria used to judge a section as exemplary are all criteria that relate to the sampling section and not another section of the proposal, such as procedure or instrument. In some cases, the researcher might use a panel of experts in the content area to examine

Self-Developed Measures and Qualitative Measurement 131

EXHIBIT 5.8 RUBRIC FOR GRADING A PORTFOLIO, Cont’d.

Evaluation Criteria Comments

Some facts/content are clearly indicated, but confusion or lack of clarity is evident as activities progress to higher levels of thinking.

No evidence there is awareness of NYSED Learning Standards.

Content is consistently presented from a single perspective.

Unacceptable Weakness in content mastery is evident.

Lack of clearly specified learning concepts is evident.

Facts or concepts are inappropriate or inaccurate.

No evidence is seen of awareness of NYSED Learning Standards.

Note: NYSED = New York State Education Department. The Learning Standards are state standards or benchmarks for student learning.

or validate each section of the rubric to ensure that its criteria are truly those that the experts believe the work should be judged against. As in the example above, a panel of faculty who teach educational researchers would be appropriate to re-view such a rubric.

Journals

Teachers, researchers, or participants often use journals to record thoughts, feel-ings, and ideas about the research activities. Journals may be structured by pre-senting questions for participants to address, or participants may be allowed to enter any thoughts or feelings that they like. Teachers often use journals to reflect on their feelings, perceptions, and interpretations of what they see happening in their classrooms. Journals might also include drawings or diagrams as well as writ-ten comments. Some researchers use journals to record notes from their observa-tions. Journals may be actual notebooks or recorded as computer files.

Records and Documents

Educational settings are deluged in paper and computer files! As part of this paper and electronic onslaught, records and documents can reveal much about the inner workings of a school. Minutes of meetings, report cards, attendance records, dis-cipline records, teachers’ lesson plans, letters from parents, and written evalua-tions of teachers are all part of the massive amounts of data available in records and documents. As noted in Chapter Four, much of this archival data is also used by professional researchers. Action researchers are likely to utilize the records and documents produced within their immediate settings. As such, records and doc-uments can be a valuable way to corroborate information from other sources. If a teacher reports in observational notes that student motivation increased as a re-sult of a new instructional strategy, attendance records could be used to back up those claims.

Maps, Photographs, and Artifacts

Maps depicting the layout of classrooms and schools are useful ways to commu-nicate aspects of an action research setting that are hard to capture in words. A map can show at a glance how a classroom is laid out to facilitate discussion or to indicate how certain groups are isolated within a school. Maps may also assist a practitioner in thinking about the interactions, traffic flow, or organization of re-sources in a school or classroom and might help to visualize alternative

arrange-ments. In addition, maps convey a lot about the context of a classroom, which is important information in this largely qualitative type of research.

Photographs are another type of visual data that can capture rich detail about the people, activities, and context of an educational setting. Researchers might take photographs themselves or might ask participants to take photos using disposable cameras. This provides a useful window into the participants’ experiences and per-spectives and enables the researcher to share the products of the research—that is, the photos—with collaborating participants. Videotapes and audiotapes are more obtrusive and run the risk of changing participants’ behaviors as students “mug for the camera.” However, the advantage of videotapes and audiotapes is that they allow the researcher to record actions and interactions more fully and to revisit the events at a later time. This is a major advantage when the practitioner is trying to juggle the roles of researcher and practitioner. In addition, with the current em-phasis on outcome-based performance assessment, videotapes and audiotapes allow the researcher-practitioner to document student knowledge, skills, interests, and motivation. For example, Figure 5.2 depicts two youth and an instructor (K.H.V.) admiring a robot constructed by the youth in a technology-based summer camp.

Notice how even in the still photograph, you can see the youth’s interest and engagement and the complexity of their mutual creation. Photographs like this allow both the youth and the instructor to reflect on what they have learned.

Schools and classrooms are also full of “stuff ” that researchers call artifacts.

Artifacts are objects used in the process of teaching and learning or products that result from the process of teaching and learning. Artifacts in educational settings might include desks, sample lesson plans, portfolios, textbooks, bulletin boards, a laser pointer, a board used for paddling students, manipulatives, athletic equip-ment, or anything else found or produced in schools. Artifacts are often examined and described in writing as part of one’s observations. Some artifacts might be more systematically analyzed to see if they convey underlying assumptions about the nature of education. For example, textbooks might be examined to see how they portray gender roles, age groups, or ethnic groups. The types of notices posted on bulletin boards might be examined to reveal which groups and events receive the most promotion at a school.

Evaluating the Quality of Measures Used in Action Research

Like archival data, the measures used in action research are not typically subjected to the same level of scrutiny as preestablished measures regarding reliability and va-lidity. However, one can follow the same general principles to provide evidence that the measures are valid and reliable. Grading rubrics provide a way of supporting

Self-Developed Measures and Qualitative Measurement 133

the reliability of practitioner-made measures. Content of assignments and tests can also be examined by other teachers or related to a state’s learning standards as evi-dence of validity. In addition, the repeated use of practitioner-made measures and their refinement over time to fit a particular school or classroom provide evidence to support the reliability and validity of measures used in action research.

Videotaping and audiotaping are tools that can also be helpful in establishing reliability for action research measures. If a rubric or observational checklist is being used to gather data from a classroom activity, the situation can be also videotaped.

The recordings then can viewed and scored by multiple people at a later date. The data from both the checklist and the videotape can then be compared. Videotaping can also be valuable in training members of a research team who will collect data.

Such training is necessary for consistency to be maintained across scorers.

FIGURE 5.2 EXAMPLE OF A PHOTOGRAPH USED IN ACTION RESEARCH STUDY.

Chapter Summary

In most cases, researchers conducting survey, single-subject, and action research use self-developed instruments to gather data. Survey researchers have the option of using several different methods or scales to collect data from participants. The Lik-ert scale is the most common scale used in survey research. This scale provides a statement and asks participants to respond to a scale composed of a series of quan-tified responses (e.g., 1 for strongly disagree to 6 for strongly agree). Semantic scales are similar to Likert scales, but for these scales, participants must judge words or phrases describing persons, events, activities, or materials, and these are rated by checking a point along a line that separates two polar opposites (e.g., good and evil).

Like preestablished instruments, reliability and validity must be established for self-developed instruments. Reliability and validity for a survey is established through the use of a pilot test. A pilot test is a “dress rehearsal” when the re-searcher administers the survey to a representative group from the sample called the pilot group. By administering the survey to the pilot group at two different points in time, stability or test-retest reliability can be established for the survey.

Reverse items embedded within the survey will help to establish internal consis-tency. Reverse items are two items on the survey that, for the participants’ re-sponses to be consistent, they must be opposite on these items. Face validity for a survey can also be established by having the pilot group examine the survey for an appropriate title, clear directions, and overall language and reading level. Con-tent validity is established by having pilot group members examine the survey for both sampling and item validity. Sampling validity would be established by hav-ing participants examine all the items on the survey to ensure that all the issues that make up the topic are touched on. Item validity is established by making sure that similar items are grouped together. In some cases, a panel of experts other than members from the pilot group needs to be used.

Observational checklists are another type of self-developed instrument that collect quantitative data from observations of participants in natural settings.

Checklists function as a list of criteria that the researcher is looking to find. Check-lists add to the reliability and validity of the data-collecting process by helping to standardize the observations. Training of multiple observers is also required when using checklists and will increase the consistency or interrater reliability between two raters or intrarater reliability when a judge is rating the same observation (e.g., a teacher’s videotaped lesson) or the same materials two or more times. Re-searchers should be aware of three main threats to validity when using an obser-vational checklist. One is observer bias, which is when the observer’s background or perceptions influence the situation. Contamination is another threat and occurs

Self-Developed Measures and Qualitative Measurement 135

when the observer’s knowledge of the study affects his or her observations. The last is called the halo effect, which occurs when the participants that a researcher is observing act in a certain way (usually better than usual), thus distorting the ob-servational data being collected.

Observation is one method used to collect qualitative data. There are several approaches used to collect data through observation: complete participant, par-ticipant as observer, observer as parpar-ticipant, and the complete observer. Qualita-tive researchers are careful to note the characteristics of both the physical setting in which the observation is taking place and the individuals being observed. In-terviews are another key method for data collection in qualitative research. Inter-views can be conducted in small groups or individually with participants and the researcher. Group interviews are also referred to as focus groups in which the re-searcher has an opportunity to collect data from multiple participants and to record the interactions and group dynamics as they unfold. A good interviewer should start out by reintroducing himself or herself, restating the purpose of the study, and reminding the participant about confidentiality of responses. Next, gen-eral demographic information should be obtained from the participant, and the interviewer should use protocol questions as a starting point and probes to gather more in-depth responses from participants where needed. Documents and other artifacts can also be used as sources for qualitative data.

Action researchers also develop their own instruments. These tools and pro-cedures for their administration are usually less rigorous than most qualitative and quantitative methods used in applied research. Teachers hoping to measure a change in students’ skills and competencies frequently use both checklists and rat-ing scales. Checklists might also be designed for students to measure the devel-opment of their own skills. Grading rubrics are a common type of framework used by teachers and students to document and measure various aspects of learn-ing. Reliability and validity can also be established for rubrics. Journals, maps, photographs, and artifacts are other alternative forms for collecting data that can be created by students and used to measure outcomes and growth.

Interview protocols are a series of open-ended questions that serve as a frame-work when conducting an interview. Because most qualitative interviews are typ-ically unstructured, a protocol may serve as a place from which to start the interview. When completed, the actual interview should look more like a conver-sation than an interview with set questions and responses.

Key Concepts

self-developed instruments pilot test

Dalam dokumen METHODS IN EDUCATIONAL RESEARCH Y (Halaman 156-168)