One person or several people can write items for a scale. If more than one person is involved in item writing, all the item writers should understand the variable and agree on its meaning. Ideally, all the item writers have involved themselves in the concept analy- sis and the generation of the theoretical definition and attributes for the concept and the statement of the variable. Following the creation of the definition, the group writes several sample items. They review the sample items for consistency with the concep- tual definition. After the members of the group agree on the final forms of these sam- ple items, they assign each member to write items according to the matrix. After the items are complete, the group convenes to discuss and modify them until members agree that the items adequately measure the variable, and they express satisfaction with the phrasing of the items. During the testing of the scale, researchers identify and delete weak items, leaving only those that have functioned well during the testing process.
Write Rules for Scale Scoring and Administration
The final task in scale development is to write rules for scoring items and administer- ing the scale to respondents. Rules for scoring scale items correspond to the type of scale used. In Chapter Two, we presented a description of five types of scales:—VAS, Thurstone, Guttman, semantic differential, and Likert. Because the Likert scale (also called a summated rating scale) is the most common type of scale used in health be- havior research, we will describe the scoring rules for the Likert scale in the section that follows. Though scoring rules depend on the type of scale, rules for administra- tion depend on the preferences of the scale developer and, to some extent, correspond to the medium used to present the scale and the data collection procedures. For exam- ple, a seven-point scale might be used for scale items that participants complete using paper and pencil. In contrast, the developer might consider using three to five response options for scale items presented to participants over the telephone. Scale developers may have preferences about the order of the items, the type of response options, the use of adjectives, adverbs, or numbers on the response scale, and the format for presenta- tion. These preferences are included in the description of the scale.
than a statement of fact and suggested using items with the word should as well as items that are “clear, concise, and straightforward” (Likert, 1932, p. 45). Likert’s original scale consisted of statements and the following five response options for each state- ment: strongly approve, approve, undecided, disapprove, and strongly disapprove.
Today, response options are designed to represent graduated increments of agree- ment, frequency, or evaluation (Spector, 1992). Using a scale that assesses agreement, re- spondents indicate how much they agree with a particular statement; for frequency, respondents indicate how often they perform a particular behavior; and for the evaluative scale, respondents rate items according to how positively or negatively they feel about what is described. The choices for response options include agree/disagree; never/always; never/very often; never true/always true; and not at all sure/very sure. The researcher should select the type of scale (agreement, frequency, evaluative) that corresponds to the statement of the vari- able. Ideally, the wording of the response options is selected before writing the items. How- ever, because of the dynamic nature of the item-writing process, a researcher might want to make the final decision on the response options after a draft of some or most of the items has been written. If researchers delay the decision, they must review the items to make sure respondents can answer them on the selected response scale (Exhibit 7.1).
EXHIBIT 7.1. EXAMPLES OF RESPONSE OPTIONS.
Agreement
Strongly Disagree Neither agree Agree Strongly
disagree nor disagree agree
Not at Somewhat Somewhat Very
all true untrue true true
Evaluation
Not at all Unimportant Somewhat Somewhat Important Very
important unimportant important important
Most Unpleasant Neutral Pleasant Most
unpleasant pleasant
Not at Somewhat Very
all sure sure sure
Frequency
Never Almost never Sometimes Most of Always
the time Not at all Occasionally Frequently Regularly
If words are being used as response options, the researcher needs to decide on the terminology for rating each item. As noted above, the researcher has several choices, including agree/disagree, never/always, never true/always true, and not at all sure/very sure. Re- gardless of the response options, it is important to read each item to make certain the items are consistent with the response choices. In the first example below, the ques- tion cannot reasonably be answered with the response options given. The item makes more sense using the second set of response options.
How often do you wear a helmet when you ride a motorcycle?
Strongly agree Agree Neither Disagree Strongly disagree How often do you wear a helmet when you ride a motorcycle?
Every time Most of the time Sometimes Hardly ever Never
Number of Response Options
Likert (1932) originally proposed five response choices for attitudinal items, ranging from strongly disapprove to strongly approve. However, today the number of options can vary from three to ten. Although theoretically we can provide more than ten choices, most people find it difficult to discriminate among more than seven options (Streiner
& Norman, 1995). Occasionally a researcher uses a scale labeled from 0 to 100. In this case, the scale is usually calibrated in units of 10, so that in reality, no more than eleven choices are available. For example:
Rate your level of confidence with the following statement:
How sure are you that you can always exercise three times per week for thirty minutes each time?
0 10 20 30 40 50 60 70 80 90 100
Not at all sure Very sure
We could also show the scale in this way:
0 1 2 3 4 5 6 7 8 9 10
Not at all sure Very sure
The number of response options depends on several considerations, including the age of the respondent, the experience of the respondent with this type of rating scale, the ability of the respondent to discriminate between options, the preference of the researcher, and the content of the scale. Respondents who are very young, are very old, or have low literacy have difficulty discriminating among more than two or three
options. Thus, limiting response options to two choices, such as yes/no or agree/disagree, makes it easier for members of these populations to answer. The same is true for in- dividuals who have no experience in answering items using Likert-type scales. The novelty of the scale is likely to interfere with their responding accurately to the items.
As noted above, the ability to discriminate between options is an important con- sideration. Items that include only yes and no as possible responses fail to allow room for discrimination. Thus, individuals who might agree that motorcycle helmets should be worn under certain conditions (such as while riding on the highway) might have difficulty responding to the following item:
People who ride motorcycles should wear helmets.
Expanding the number of response options to five, with anchors of strongly agree and strongly disagree, allows respondents more choices. Expanding the response options further, to seven, may offset the tendency of some participants to avoid selecting the extreme (or end) options (Streiner & Norman, 1995). In addition, research has shown that scales using few response options have lower internal-consistency reliability than those with more. The difference is due to the lack of precision and the loss of infor- mation that occur when respondents are given an insufficient number of response choices (Spector, 1992). Based on research studies and experience, the most com- mon recommendation is that scales include between five and seven response choices.
Odd or Even Number of Categories
Another issue raised repeatedly among researchers is whether to use an odd or even number of response options. For agreement and evaluative rating scales, the middle category suggests a neutral point or a don’t know response. For example:
People who ride motorcycles should wear helmets.
Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree A person who does not have strong opinions might choose the middle category, as would a person who is unfamiliar with the value of motorcycle helmets. Researchers differ in their opinions about how to analyze and interpret this middle category. Some suggest deleting all respondents who choose the middle category or including them in the analyses as a separate group. Others consider the middle category to be the mid- point of a continuum between agree and disagree.
The use of an even number of categories, as shown below, forces respondents to choose an agree or disagree option and avoids the need to make decisions about how data will be treated for analysis and interpretation.
People who ride motorcycles should wear helmets.
Strongly agree Agree Disagree Strongly disagree
Meanings of Response Option Adjectives or Adverbs
Some research examines the ways in which people interpret adjectives or adverbs that label options on the response scale (Ostrom & Gannon, 1996; Schaeffer, 1991). A fre- quency scale, for example, might include the following adverbs: never, rarely, sometimes, often, most of the time, and always. Research has shown that people interpret these ad- verbs differently. For some people, the terms sometimes and often are closer in their in- terpretations than the terms often and most of the time. The researcher might make the assumption that these adverbs represent equal intervals, as shown here:
Never Rarely Sometimes Often Most of the time
Other respondents might interpret the adverbs as representing unequal intervals, in this way:
Never Rarely Sometimes Often Most of the time
In a related issue, respondents may disagree on the interpretation of the words them- selves due to the vagueness of the terms. So, when responding to an item, people will respond with their own interpretations—what the item means to them—at the time.
Obviously, the researchers will not be aware of these interpretations when they analyze the data and will make assumptions about how people interpreted the items. In Chap- ter Eleven, we will discuss cognitive assessment as a method to evaluate items to deter- mine how people are interpreting them. Information from this type of assessment can help the researcher improve the statement of items before completing the final scale.
Use of Adjectives, Adverbs, or Numbers, or Combinations Thereof
In developing a scale, a researcher must decide whether to use words or numbers or both as response options. We know that adjective and adverb descriptors are gener- ally vague and that respondents interpret the words differently. Using numbers may help people decide on the degree to which they endorse a particular item. In U.S. so- ciety, we commonly use the 1-to-10 verbal scale. (“On a 1-to-10 scale, how would you rate . . . .?”) In this instance, adding numbers or using numbers with adjectives or adverbs might help people with the task of determining their levels of agreement or their opinions. Schwarz and colleagues (1991), however, found that the values of the numbers associated with adjectives can make a difference in the ways people
respond. In their study, participants were asked to complete the same eleven-point scale with the extremes labeled not at all successful and extremely successful. Half the participants received scales whose response options were labeled with numbers from -5 to +5;
for the other half, the response options were labeled from 0 to 10. Only 13 percent of participants responding to items using the -5 to +5 scale selected responses at the lower end of the scale (-5 to 0), whereas 34 percent of participants in the other group selected responses at the lower end when the scale ranged between 0 and 5. This study shows that people do use numbers in addition to interpretations of adjectives to make decisions about their responses (Streiner & Norman, 1994).
Positively and Negatively Worded Items
To encourage attention to the task and reduce the tendency for response sets such as yea-saying, Likert (1932) suggested that about half the items be presented in a posi- tive way and the other half in a negative way. The typical Likert scale, such as the Rosenberg Self-Esteem Scale, follows this convention (see Table 2.1 in Chapter Two).
And though it is a good rule to follow, the negative statements themselves can be a source of respondent measurement error if they include words such as not or never or prefixes such as un-. In Chapter Four, we indicated that one of the rules of item writ- ing was to use negative words sparingly and, when one is used, to highlight it in the item statement. The reason is that respondents who are using a satisfising approach to complete a scale can misread the negative words and answer incorrectly. If a nega- tively worded item is used, the item should be phrased in such a way that words such as not are avoided. For example, the item, “I did not take my medications yesterday,”
could be rewritten as, “I missed my medications yesterday.”
Another issue with negatively worded items is that some concepts and response scales do not lend themselves very well to negative statements. Bandura (1997) rec- ommends that self-efficacy items be positive statements because of the awkward word- ing needed to meet the requirement of negatively worded items. Consider the following item: The participant is asked to rate his or her level of confidence on a scale from 0, not confident at all, to 10, very confident that I can do: “I can always walk outside, even when it is raining.” The same statement, worded negatively, might be,
“I do not walk outside when it is raining.” Despite the negative wording, the item can- not be answered using the confidence rating scale because the interpretation of the item would change.
Scoring
To create a total score for a summated rating scale, responses to individual items are summed. Some researchers use the total score in analyses; others elect to compute a mean score. The latter method converts the total score into the units used to rate the
individual items (that is, 1 to 4 or 1 to 5) and is sometimes helpful when the researcher compares scores from several scales. When summing responses to individual items, the researcher must reverse-score the negatively worded items so that the total score re- flects a greater amount or degree of the concept. (See Chapters Two and Eight.) To demonstrate the scoring for a Likert scale, the OE scale for healthy eating is presented in Table 7.4. There are twelve items, each rated on a four-point rating scale from strongly disagree to strongly agree. Total scores are found by summing responses to individual items.
The possible total scores for this scale range from 12 to 48. A high score reflects more positive outcomes associated with eating a healthy diet.
TABLE 7.4. ITEMS ON THE OUTCOME EXPECTATIONS FOR HEALTHY EATING SCALE.
Below is a list of statements about things that might happen if you were to eat a healthy diet. A healthy diet is one that is low in fat and high in fruits and vegetables.
Circle the number from 1 to 4 that most closely fits with how much you agree with each statement. If you strongly disagree, circle 1. If you disagree with the state- ment, circle 2. If you agree, circle 3. If you strongly agree, circle 4.
Statement Strongly Disagree Agree Strongly
Disagree Agree
1. If you eat a healthy diet, you will 1 2 3 4
have more energy.
2. If you eat a healthy diet, you will 1 2 3 4
feel better about yourself.
3. If you eat a healthy diet, you will 1 2 3 4
live longer.
4. If you eat a healthy diet, your 1 2 3 4
best friends will approve.
5. If you eat a healthy diet, you will 1 2 3 4
feel more responsible.
6. If you eat a healthy diet, you will 1 2 3 4
control your weight.
7. If you eat a healthy diet, you will 1 2 3 4
feel proud of yourself.
8. If you eat a healthy diet, you will 1 2 3 4
be less likely to have a stroke.
9. If you eat a healthy diet, your 1 2 3 4
friends will be proud of you.
10. If you eat a healthy diet, you will 1 2 3 4
feel you are doing the right thing.
11. If you eat a healthy diet, your 1 2 3 4
doctor will approve.
12. If you eat a healthy diet, your 1 2 3 4
close relatives will approve.
Summary
This chapter has been devoted to a discussion of the steps necessary to write items for scales. The process begins with a statement of the theoretical definition, which is used to guide writing items that are consistent with both the content domains and the di- mensions denoted by the concept. A content-domain by concept-dimension matrix helps the scale developer visualize the task and ensures that an adequate number of items are written to measure both the content and any dimensions of the concept.
Upon completion of the item-writing task, the scale developer stipulates the rules for scoring and for administration of the scale in research or evaluation studies.
136
LEARNING OBJECTIVES
At the end of this chapter, the student should be able to 1. Describe the characteristics of a correlation.
2. Use SPSS to run a correlation between two variables.
3. Interpret a correlation between two variables.
4. Describe a correlation matrix.
5. Discuss issues of causality, sample size, group differences, and restriction of range related to correlations.
6. Describe the characteristics of variance.
7. Calculate variance.
8. Interpret variance.
9. List the uses of ANOVA.
10. Use SPSS to conduct an ANOVA.
11. Interpret the results of an ANOVA.