• Tidak ada hasil yang ditemukan

Measurement Models in Assessment and Evaluation

7.2 Unidimensional Models for Dichotomous Items .1 Parameter separation

7.2.2 The Rasch model

In the previous section, it was shown that the principle of parameter separation could be used to calibrate the student abilities on a common scale. In this section, a stochastic model for responses of students to items will be introduced.

In this, and the following sections, the focus is on dichotomous data. A response of a student i to an item k will be coded by a stochastic variable Yik. In the sequel, upper-case characters will denote stochastic variables. The realizations will be lower case characters.

In the present case, there are two possible realizations, defined by

(2) Above, we considered the case where not all students responded to all items. To indicate whether a response is available, we define a variable

(3)

For the moment, it will be assumed that the values are a-priori fixed by some test administrator. Therefore, dik can be called a test administration variable. We will not consider dik as a stochastic variable, that is, the estimation and testing procedure will be explained conditionally on dik, that is, with dik fixed. Later, this assumption will be broadened.

In an incomplete design, the definition of the response variable Yik is generalized such that it assumes an arbitrary constant if no response is available.

An example of a data matrix is given in Table 7.3. The arbitrary constants for unobserved valued of Yik are omitted.

Table 7.3 Data Matrix with Observed Scores.

Item 1 2 3 4 5 6 7 8

Respondent Score θi

1 0 1 1 1 3 1.65 2 0 1 1 1 3 1.65 3 0 1 1 0 2 0.37

4 1 0 0 0 1 −0.90

5 1 1 0 1 3 0.77

6 1 0 0 1 2 −0.37

7 0 0 1 1 2 −0.37

8 1 1 1 0 3 0.77 9 1 0 0 1 2 0.14 10 0 1 1 1 3 1.42

bk 1.57 −0.09 −0.67 0.73 −0.38 −0.38 0.20 −0.98

The simplest model, where every student is represented by one ability parameter and every item is represented by one difficulty parameter, is the 1-parameter logistic model, better known as the Rasch model (Rasch, 1960). It is abbreviated as 1PLM. It is a special case of the general logistic regression model. This also holds for the other IRT models discussed below. Therefore, it proves convenient to first define the logistic function:

The 1PLM is then defined as

that is, the probability of a correct response is given by a logistic function with argument θi−bk. Note that the argument has the same linear form as in Formula (1). Using the abbreviation Pk(θ)=p(Yi =1|θ, bk), the two previous formulas can be combined to

(4)

The probability of a correct response as a function of ability, Pk(θ), is the so-called item response function of item k. Two examples of the associated item response curves are given in Figure 7.1. The x-axis is the ability continuum θ. For two items, with distinct values of bk, the probability of a correct response Ψbk) is plotted for different values of θ. The item response curves increase with the value of θ, so this parameter can be interpreted as an ability parameter. Note that the order of the probabilities of a correct response for the two items is the same for all ability levels.

Figure 7.1 Response curves for two items in the Rasch model.

That is, the two item response curves are shifted. Further, the higher the value of bk, the lower the probability of a correct response. So bk can be interpreted as an item difficulty.

This can also be inferred from the fact that in θibk the item difficulty bk is subtracted from the ability parameter θ. So the difficulty lowers the probability of a correct response.

The ability scale is a latent scale, that is, the values of θ cannot be directly observed, but must be estimated from the observed responses. The latent scale does not have a natural origin. The ensemble of curves in Figure 7.1 can be shifted across the x-axis. Or to put it differently, a constant value c can be subtracted from the ability and item parameters without consequences for the probabilities of correct responses, that is, Ψi−bk)=Ψ((θi−c)−(bk −c)). Imposing an identification restriction solves this indeterminacy of the latent scale. The scale is fixed by setting some ability or difficulty equal to some constant, say zero. One could also impose the restriction

Several estimation procedures for the ability and item parameters are available; they will be discussed below. Estimation boils down to finding values of the parameters such that the data are represented as good as possible by the model. In the example given here, a maximum likelihood estimation procedure was used. The last column of Table 7.3 gives the estimates of the ability parameters, the bottom line gives the values of the item difficulties. The estimation procedure will be outlined further below. The probabilities of correct responses can be estimated by inserting the parameter estimates in (5). The resulting values are displayed in Table 7.4. Note that also the probabilities of the unobserved responses can now be computed. With these estimates the expected scores on the test not administered can now be computed. These expectations are displayed in the last two columns of Table 7.4.

Model Fit

The distance between the responses and the expectations under the model are an indication of model fit. For instance, the response pattern of the first student, which was (0,1,1,1) can be compared with the expected response values (.52, .85, .91, .71). The closer the expectations to the observations, the better the model fit. As a practical test for model fit, this approach is far from optimal. Even for tests of moderate length and moderate sample sizes, the tables of observed and expected become quite big. Therefore, the information supplied this way is hardly informative about the nature of the model violations. This problem can, be solved by collapsing the table of frequency counts of response patterns into a smaller and more informative table. This will be returned to in sections on model fit.

Table 7.4 Data Matrix with Observed Scores.

Item 1 2 3 4 5 6 7 8 Expected Score

Respondent Test 1 Test 2

1 .52 .85 .91 .71 .88 .88 .81 .93 3.00 3.51 2 .52 .85 .91 .71 .88 .88 .81 .93 3.00 3.51 3 .23 .61 .74 .41 .68 .68 .54 .79 2.00 2.69 4 .08 .31 .44 .16 .37 .37 .25 .52 0.99 1.00 5 .31 .70 .81 .51 .76 .76 .64 .85 2.33 3.00 6 .13 .43 .57 .25 .50 .50 .36 .65 1.38 2.00 7 .13 .43 .57 .25 .50 .50 .36 .65 1.38 2.00 8 .31 .70 .81 .51 .76 .76 .64 .85 2.33 3.00 9 .19 .56 .69 .36 .63 .63 .49 .75

10 .46 .82 .89 .67 .86 .86 .77 .92