• Tidak ada hasil yang ditemukan

Developing Scales for Use in Surveys

Dalam dokumen METHODS IN EDUCATIONAL RESEARCH Y (Halaman 135-142)

Descriptive-survey research has some of the most well-developed procedures for creating instruments of any research area. Much of the process of developing and

refining surveys is discussed in Chapter Seven. Here we discuss the process of cre-ating rcre-ating scales for the survey items.

Likert Scales

Developed by Rensis Likert for his doctoral thesis, the Likert scale is the most widely used scale in survey research and certainly the one that has found its way into popular culture. Many people actually use the term “Likert scale” without knowing the scale’s complexities and uses. The classic use of the Likert scale was to pose questions or items to participants and have them respond using an agree-ment scale by selecting a number that best represented their response.

Presented in Exhibit 5.1 are some items taken from a survey of preservice teacher candidates. The purpose of this section of the survey was to document candidates’

expectations of their field placement experience and the role of technology.

Notice that the Likert scale used in Exhibit 5.1 is what is referred to as a 6-point scale, meaning that there are six possible choices for participants. Depend-ing on the purpose of the study, the researcher may decide to use a 6- or a 5-point scale. The range for a 5-point scale would be strongly disagree, disagree, slightly agree, agree, and strongly agree. Sometimes a researcher chooses to have a neutral response in the middle. In that case, the scale might look something like the one presented below.

Strongly Disagree Disagree Neutral Agree Strongly Agree

1 2 3 4 5

Self-Developed Measures and Qualitative Measurement 107

EXHIBIT 5.1 SAMPLE SECTION OF SELF-DEVELOPED SURVEY.

Host Teacher SD SA

1. My host teacher will help me find the technology that is

available in my school. 1 2 3 4 5 6

2. I expect to spend some of my planning time working with

my host teacher to integrate technology into the classroom. 1 2 3 4 5 6 3. I believe that I will learn a lot about integrating technology

into instruction from my host teacher. 1 2 3 4 5 6

4. I expect that I will be able to provide expertise to my host

teacher in integrating technology. 1 2 3 4 5 6

5. I expect that my host teacher will provide feedback to me

regarding my use of technology in my lessons. 1 2 3 4 5 6

Note: 1 = strongly disagree (SD); 2 = disagree; 3 = slightly disagree; 4 = slightly agree; 5 = agree;

6 = strongly agree (SA).

Caution should be taken when including a neutral response in your scale. When the Likert scale was first developed, it did not include a neutral response because Likert did not believe that there were “neutral” people walking around and that even if you were not passionate about an issue, you would at least feel a little some-thing one way or the other. In certain circumstances, using a neutral response is per-fectly acceptable and appropriate; however, in cases where a decision may be made based on the data, it is advised not to include the neutral response. For example, if one is constructing a survey to gather teachers’ attitudes about an issue in a school building, depending on the situation, many teachers may indicate “neutral” for many reasons (they may fear that their job is on the line, they may not know what is in store for them if the school goes ahead with the project, etc.). However, the ad-ministrator has to make a decision based on the information that the survey pro-vides. If results come back with 100% of the staff neutral regarding the project, that is not going to help the administrator. If the neutral response is removed and the staff is “forced” to make a decision one way or another, data showing that 70% of staff are in “slight agreement” with the project helps shed some light on the situa-tion and assists the administrator with the task of making a decision.

Likert scales are often called agreement scales because participants are asked whether they agree or disagree with the statements presented. Self-developed mea-sures may include other types of scales in which statements are presented and par-ticipants are asked to make different types of judgments, such as the frequency of use, quality, importance, or perceived effectiveness of educational practices or per-sons in educational settings.

Semantic Differential Scales

Another type of rating scale is the semantic differential scale. Unlike the Likert scale, which uses sentence statements, in a semantic differential scale, participants are asked to make judgments regarding words or phrases describing persons, events, activities, or materials. The ratings are made by checking a point along a line indicating a continuum between two polar opposites. The scale is usually set up so that positive and negative responses occur at each end of the scale with equal frequency. See Exhibit 5.2 for an example of a semantic differential scale that might be used as a quick evaluation of a workshop. One advantage to using semantic differential scales is that responses can be made quickly because little reading is required. However, the information produced is superficial.

Establishing Reliability and Validity for Surveys

After a survey has been developed, it should undergo a pilot test. The pilot group would need to be selected and pilot participants instructed as to their

re-sponsibilities. It is through the use of a pilot test that the researcher is able to es-tablish reliability and validity for the self-developed survey.

Stability or Test-Retest Reliability. As discussed in Chapter Four, stability or test-retest reliability consists simply of giving the same measure to the same group of in-dividuals at two different points in time. In other words, a high school science teacher who responds “strongly agree” to an item asking about the use of student portfo-lios for assessment purposes would hopefully still respond “strongly agree” a few weeks later when filling out the same survey. As part of the piloting process, the re-searcher would analyze pilot participants’ before and after responses to determine the degree of consistency and report such findings using a correlation coefficient.

From time to time, a researcher may find that participants lack consistency on a particular item or series of items. For example, a group of participants might answer “strongly agree” on the first survey administration but answer “strongly disagree” the second time a few weeks later. In cases such as this, the researcher will want to do some further investigating before coming to the conclusion that the item is unstable. To do so, the researcher would contact the individuals and

Self-Developed Measures and Qualitative Measurement 109

EXHIBIT 5.2 SEMANTIC DIFFERENTIAL SCALE.

Workshop Presenters

Knowledgeable ___ ___ ___ ___ ___ ___ Not knowledgeable

1 2 3 4 5 6

Unresponsive ___ ___ ___ ___ ___ ___ Responsive

1 2 3 4 5 6

Helpful ___ ___ ___ ___ ___ ___ Not helpful

1 2 3 4 5 6

Workshop Materials

Useful ___ ___ ___ ___ ___ ___ Not useful

1 2 3 4 5 6

Poorly organized ___ ___ ___ ___ ___ ___ Well organized

1 2 3 4 5 6

Appropriate ___ ___ ___ ___ ___ ___ Inappropriate

1 2 3 4 5 6

conduct either focus group or one-on-one interviews. For example, let us say that there was inconsistency among a group of teachers regarding the item that asks about technology use in the classroom (item 3 in Exhibit 5.1). On the first round, the group of teachers indicated that they “strongly agree” that they all had posi-tive experiences using technology in their classrooms. However, on the second de-livery of the survey, this group indicated that they “strongly disagree.” This would certainly constitute an inconsistency for this particular item. However, let us say that in interviewing these teachers, it is discovered that in the time between the two survey administrations, these teachers had a bad experience using technology in their classroom. For example, a computer crashed, and the teachers lost all their work, or a lesson with technology in it did not go as planned. In light of this new information, is this item still considered unstable? The answer is No. The item was perfectly stable for those in the pilot who did not have a bad experience with technology during this period, and it also consistently measured the changes in attitudes for those teachers who unfortunately did have a bad experience during that time period. Items where the researcher is unable to establish a reason for the dramatic shift in participant responses should be refined or deleted from the final survey. Most descriptive-survey researchers who are sending a survey to a sample on a “one-shot” basis typically do not go through the efforts to establish stability for the instrument. However, if the overall purpose of the study is to show a change in participants’ attitudes or beliefs, it is necessary to document that the survey has test-retest reliability. By doing so, the researcher is assured that partic-ipant responses are indeed changing and that such change in attitude is not merely the result of an inconsistent instrument.

Internal Consistency for Surveys. In Chapter Four, internal consistency was dis-cussed in relation to split-half reliability. Internal consistency can also be estab-lished for self-developed surveys, with a slightly different twist. A reverse item is a technique or method used by researchers to establish internal consistency by creating two items that are opposite. To be consistent, respondents must select op-posite responses when answering the items. For example, if a teacher responds

“strongly agree” to the item “Technology provides multiple methods in deliver-ing instruction to all student learners,” then the teacher must select “strongly dis-agree” on the item that appears later in the survey, “Technology does not allow for multiple methods in delivering instruction to all student learners.”

Although reverse items are a useful technique, too many of them in a survey may aggravate participants and result in a low response rate. The purpose of re-verse items is not to trick participants into making a mistake but to ensure that participants are consistent in their own thinking and that such consistency is being delivered in their responses. So what happens to participants whose reverse items

are not consistent? Depending on the sample size and the overall purpose of the study, a researcher may decide to “junk” or remove a participant’s survey from the study altogether, especially where multiple reverse items have not been scored consistently.

Developing Surveys from Established Forms. As a result of conducting a review of the literature, a survey researcher might come across a survey that has already been developed and used in a previous study. If the researcher uses the survey with-out making modifications, the reliability and validity information that has been es-tablished can be reported in the new study’s instrument section. However, because survey research is interested in current issues, the chances of finding an already-established survey that is addressing the exact same issue from the exact same pop-ulation is unlikely. Faced with this dilemma, many survey researchers find themselves borrowing both items and scales from already-developed surveys. (A scale is a set of items that are thought to measure some trait, attitude, or behavior. Educational mea-sures often can be broken down into separate scales, each containing items that ad-dress different aspects of the construct being measured.) When modifications are made to an already-established instrument, the researcher cannot report its relia-bility or validity information and must reestablish reliarelia-bility and validity for the sur-vey. Although it is not uncommon for a researcher to borrow from several established instruments, it is important for the researcher to give credit where credit is due and to cite all authors whose work was adopted in creating the new survey.

Face Validity for Surveys. In addition to establishing reliability, validity also needs to be established during the piloting of a survey. Validity, the ability of an instru-ment to measure what it intends to measure, applies in much the same way for self-developed instruments as it does for preestablished instruments. However, as in establishing reliability, researchers using self-developed instruments will have to conduct the necessary steps to ensure that their instruments have validity. Pre-sented below are several types of validity that were discussed earlier and how they apply to survey research. These procedures would take place during the piloting process and would occur simultaneously alongside the work to establish sound re-liability for the instrument.

Although face validity is often considered a low level of validity and is not emphasized in preestablished measures, it does play an important role in survey design and development. The definition of face validity is that the instrument “ap-pears to be measuring” what it intends to measure. This means that on the sur-face, the questions seem to fit whatever is the described purpose of the survey.

Because the researcher is likely to tell participants the exact purpose of the study in descriptive survey studies, high face validity promotes trust and hopefully a

Self-Developed Measures and Qualitative Measurement 111

better response rate as well. Face validity can be established by having pilot par-ticipants ask the following questions as they overview the survey:

• Is the title of the survey aligned with the purpose of the study? For example, if you are gathering high school science teachers’ perceptions about curriculum, does the title of the survey reflect that?

• Are the directions clear, and do they accurately relate the intent for which the participants should be answering the questions? For example, if the researcher wants the participants to “reflect back on an experience,” is this wordage specifi-cally stated in the directions?

• Does the overall language and reading level of the survey reflect the abil-ity of the group for which the survey will be given?

During the pilot, the researcher would want to make sure that any sugges-tions made by participants would be incorporated into the final survey to increase its face validity.

Content Validity. The purpose of establishing content validity for a survey is to ensure that the survey is measuring the breadth and depth of the issue that it is intended to measure. As discussed earlier, content validity consists of two parts:

sampling validity and item validity. During the piloting process, both of these can be addressed simultaneously. Sampling validity can be established by having par-ticipants examine the survey to ensure that there are no additional sections needed to sufficiently address the issue that is under investigation. Because the pilot par-ticipants represent the sample that is being surveyed, they are much more aware of the issues of that group than perhaps the researcher, who after all is an out-sider. In such cases, the pilot participants might suggest that another section be added to the survey so that the survey covers the issue more completely. Some-times they may suggest that only additional items be added to the survey.

Where sampling validity examines the breadth of items being asked, item validity focuses on the depth of the items themselves. To ensure item validity, the researcher would also want to instruct pilot participants to review each item care-fully and to determine whether the items that make up the sections of the sur-vey in fact belong in those sections. For example, say a researcher is sursur-veying parents about an after-school program and develops a survey, with one of the sections being a checklist of various activities that students might engage in dur-ing the program. Among the items such as “chess club” and “in-door soccer,”

the researcher has accidentally placed a checklist item of “school uniforms.” Be-cause school uniforms are obviously not a type of activity, this item would lower the sampling validity of the survey. In this situation, the item should be moved

to a more appropriate section on the survey. If no such section exists, the item should be deleted.

In some cases, it may be more appropriate to use a panel of experts to es-tablish content validity because members of the pilot may not have the expertise to critically examine the items. For example, if a researcher is doing a survey of middle-level math teachers regarding high-stakes testing and state math standards, it might be more beneficial to have a panel of math experts examine the survey, in addition to the pilot group.

Dalam dokumen METHODS IN EDUCATIONAL RESEARCH Y (Halaman 135-142)