• Tidak ada hasil yang ditemukan

A reliable indicator consistently assigns the same number to some phenomenon that has not, in fact, changed. For example, if a person measures the eff ectiveness of the police force in a neighborhood twice over a short period of time (short enough so that change is very unlikely) and arrives at the same value, then the indicator is termed reliable. Or, if the rate of volunteering to a volunteer center remains constant from one day to the next, it is probably a reliable indictor. If two diff erent people use an indicator and arrive at the same value, then, again, we say that the indicator is reliable. Another way of defi ning a reliable indicator is to state that an indicator is a reliable measure if the values obtained by using the indicator are not aff ected by who is doing the measuring, by where the measuring is being done, or by any other factors other than variation in the concept being measured.

Increasing Reliability

Th e two major threats to measurement reliability are subjectivity and lack of precision. A subjective measure relies on the judgment of the measurer or of a respondent in a survey. A general measure that requires the analyst to assess the quality of a neighborhood or the performance of a nonprofi t board of directors is a subjective measure. Subjective measures have some inherent unreliability be- cause the fi nal measures must incorporate judgment. Reliability can be improved by rigorous training of individuals who will do the measuring. Th e goal of this training is to develop consistency. Another method of increasing reliability is to have several persons assign a value and then select the consensus value as the measure of the phenomenon in question. Some studies report a measured inter- rater reliability based on the consistency of measurement performed by several raters. Often, judgments about the eff ectiveness of nonprofi t boards of direc- tors are based on the ratings provided by multiple knowledgeable actors—for example, the board chairperson, the chief executive offi cer of the nonprofi t, and nonprofi t stakeholders such as funders, donors, and other similar nonprofi ts in the community.

Reliability can also be improved by eliminating the subjectivity of the analyst.

Rather than providing a general assessment of the “quality” of the neighborhood, the analyst might be asked to answer a series of specifi c questions. Was there trash in the streets? Did houses have peeling paint? Were dogs running loose? Did the street have potholes? How many potholes? Or, consider the “performance” of the local volunteer center. How many volunteers does it attract? What work do the volunteers perform? What are the results of their eff orts for the community?

Reliability problems often arise in survey research. For example, suppose that you were asked to respond to survey questions concerning the performance of one of your instructors—or a local political fi gure, or “bureaucrats,” or the vol- unteers assisting in your agency—on a day that had been especially frustrating for you. You might well evaluate these subjects more harshly than on a day when all had seemed right with the world. Although nothing about these subjects had changed, extraneous factors could introduce volatility into the ratings, an indi- cation of unreliability. If your views of these subjects actually did change and the survey instrument picked up the (true) changes, the measurement would be considered reliable. (For that reason, reliability is often assessed over a short time interval.) By contrast, a reliable measure, such as agency salaries or number of employees and volunteers, is not aff ected by such extraneous factors.

Unfortunately, although removing the subjective element from a measure will increase reliability, it may decrease validity. Certain concepts important to public and nonprofi t managers—employee eff ectiveness, citizen satisfaction with services, the impact of a recreation program—are not amenable to a series of ob- jective indicators alone. In such situations a combination of objective and subjec- tive indicators may well be the preferred approach to measurement.

Lack of precision is the second major threat to reliability. To illustrate this problem, let us say that Barbara Kennedy, city manager of Barren, Montana,

wants to identify the areas of Barren with high unemployment so that she can use the city’s federal job funds in those areas. Kennedy takes an employment survey and measures the unemployment rate in the city. Because her sample is fairly small, neighborhood unemployment rates have a potential error of 65%.

Th is lack of precision makes the unemployment measure fairly unreliable. For example, neighborhood A might have a real unemployment rate of 5%, but the survey measure indicates 10%. Neighborhood B’s unemployment rate is 13.5%, but the survey measure indicates 10%. Clearly the manager has a problem with measurement imprecision.

Th e precision of these measures can be improved by taking larger samples.

But in many cases, this task is not so easy. Let us say the city of Barren has a measure of housing quality that terms neighborhood housing as “good,” “above average,” “average,” or “dilapidated.” Assume that 50% of the city’s housing falls into the dilapidated category. If the housing evaluation were undertaken to des- ignate target areas for rehabilitation, the measure lacks precision. No city can aff ord to rehabilitate 50% of its housing stock. Barren needs a more precise mea- sure that can distinguish among houses in the dilapidated category. Th is need can be met by creating measures that are more sensitive to variations in dilapidated houses (the premise is that some dilapidated houses are more dilapidated than others; for example, “dilapidated” and “uninhabitable”). Improving precision in this instance is far more diffi cult than increasing the sample size.

Measuring Reliability

Unlike validity, the reliability of a measure can be determined objectively. A common method for assessing measurement reliability is to measure the same phenomenon or set of indicators or variables twice over a reasonably short time interval and to correlate the two sets of measures. Th e correlation coef- fi cient is a measure of the statistical relationship or association between two characteristics or variables (see Chapter 18). Th is procedure is known as test- retest reliability.

Another approach to determining reliability is to prepare alternative forms that are designed to be equivalent to measure a given concept, and then to ad- minister both of them at the same time. For example, near the beginning of a survey, a researcher may include a set of fi ve questions to measure attitudes toward government spending or trust in nonprofi t fund-raisers, and toward the end of the survey, he or she may present fi ve more questions on the same topic, all parallel in content. Th e correlation between the responses obtained on the two sets of items is a measure of parallel forms reliability. Closely related is split-half reliability, in which the researcher divides a set of items intended to measure a given concept into two parts or halves; a common practice is to divide them into the even-numbered questions and the odd-numbered questions. Th e correlation between the responses obtained on the two halves is a measure of split-half reliability. Cronbach’s alpha, a common measure of reliability, is based on this method.

In all three types of reliability measurement—test-retest, parallel forms, and split-half—the higher the intercorrelations or statistical relationships among the items, the higher the reliability of the indicators.

If several individuals are responsible for collecting and coding data, it is also good practice to assess interrater reliability. Interrater reliability is based on the premise that the application of a measurement scheme should not vary depending on who is doing the measuring (see above). For example, in screening potential applicants for a food and clothing assistance program, a nonprofi t community center might use a 10-item checklist for assessing the level of need for each client.

To determine whether agency staff are interpreting and applying the checklist consistently, we could ask fi ve employees to screen the same group of 20 clients using the checklist. High interrater reliability would exist if all fi ve employees came up with very similarly scored (or even identical) checklists for each client.

Alternatively, if the scored checklists for each client turned out to be dramati- cally diff erent, we would have low interrater reliability. Low interrater reliability can indicate that confusion exists over how a measurement instrument should be applied and interpreted.