• Tidak ada hasil yang ditemukan

Estimation procedures

Measurement Models in Assessment and Evaluation

7.2 Unidimensional Models for Dichotomous Items .1 Parameter separation

7.2.4 Estimation procedures

specific objectivity. Loosely speaking, this requirement entails invariant item ordering for all relevant subpopulations. The 2PLM and 3PLM, on the other hand, are an attempt to model the response process. Therefore, the 1PLM may play an important role in psychological research, where items can be selected to measure some theoretical construct. In educational research, however, the items and the data are given and items cannot be discarded for the sake of model fit. There, the role of the measurement expert is to find a model that is acceptable for making inferences about the students’ proficiencies and to attach some measure of the reliability to these inferences. And though the 2PLM and the 3PLM are rather crude as response process models, they are flexible enough to fit most data emerging in educational testing adequately.

only N+K-1 free parameters to estimate. On the other hand, not all equations in this system are independent, because both the summations Σk sk and Σi ri sum to the total number of correct responses given. The conclusion is that there are N+K-1 independent equations in N+K-1 free parameters. The system can be solved using a Newton-Raphson procedure (see, for instance, Molenaar, 1995).

An example of the outcome of the estimation procedure is given in the marginals of Table 7.3. The estimates of the item and student parameters given in the last row and last column of the table are the result of solving the JML equations given the data displayed in the table. Note that a score 3 on the first test reflects a higher proficiency level than the same score on the second test. Apparently, the first test was more difficult. Given these estimates, the probabilities of correct responses can now be computed using Formula (4).

The results are given in Table 7.4. The last two columns give the expected number- correct scores under the model for test 1 and test 2, respectively. For the first four students, the number-correct score on the first test equals the observed number-correct score, because these expected scores emanate from solving the estimation equations given by the formulas (11) and (12). For these students the expected number-correct score on the second test is computed analogously, that is, by summing the estimated probabilities of correct scores on items on the second test. Also here it can be seen that the second test is easier, because the scores on the second test are higher than the scores on the first test.

It turns out that JML estimation is not entirely satisfactory. This is related to the fact that the number of student parameters grows proportional with the number of observations, and, in general, this leads to inconsistency (Neyman & Scott, 1948).

Simulation studies by Wright and Panchapakesan (1969) and Fischer and Scheiblechner (1970) show that these inconsistencies can indeed occur in IRT models.

There are two maximum likelihood estimation procedures based on a likelihood function where the number of parameters does not depend on the sample size: the first one is conditional maximum likelihood (CML) estimation; the second one is marginal maximum likelihood (MML) estimation. They will be discussed in turn. CML estimation only applies to the 1PLM and some generalizations that will be shortly sketched later.

The procedure is based on the fact that the likelihood on a response pattern given a value of the sufficient statistic for ability, so given ri, does no longer depend on θi. This is, in fact, the actual definition of sufficiency. The result will not be proved here, for a proof one is referred to Rasch (1960), Andersen (1977), Fischer (1974), or Molenaar (1995).

The CML estimation equations are given by

(13) where p(Yik=1|r1,bk) is the probability of a correct response given the student’s number- correct score ri. This probability does not depend on θ. The system (13) consists of K-1 independent equations, and also here a restriction must be imposed. The estimation equations can be solved using a Newton-Raphson procedure. These equations have a structure that is analogous to the structure of the JML estimation equations in the sense that sufficient statistics for the item parameters are equated with their expected values, in this case, their expected values given the values of the sufficient statistics for the student parameters. Also in the present case the summation on the right-hand side is over the

items that were actually responded to. Of course, this procedure only produces estimates of the item parameters, and in many instances also estimates of the student parameters are needed. These parameters are estimated given the item parameter estimates; this will be returned to in the next section.

The 2PLM and 3PLM do not have sufficient statistics, and therefore, CML estimation is not feasible. An alternative and more general method for obtaining a likelihood function where the number of parameters does not depend on the sample size is by introducing the assumption that the ability parameters have some common distribution, and maximizing a likelihood that is marginalized with respect to the ability parameters.

Usually, it is assumed that the ability distribution is normal with mean µ and standard deviation σ. For the 1PLM, the MML estimation equations are given by

(14) for k=1,…,K, where E[Pki)|ri] is the expectation of a correct response with respect to the posterior distribution of θi given the number-correct score ri. So here the observed total scores sk are equated with their so-called posterior expectations. The item parameters are concurrently estimated with the mean and the standard deviation of the ability parameters. The estimation equations are given by

so the mean and variance are equated with their respective posterior expectations. Kiefer and Wolfowitz (1956) have shown that MML estimates of structural parameters, say, the item and population parameters of an IRT model, are consistent under fairly reasonable regularity conditions, which motivates the general use of MML in IRT models. The MML estimation equations for the 2PLM and 3PLM are slightly more complicated, but not essentially different, for details, refer to Bock and Aitkin (1981) and Mislevy (1984, 1986). The MML estimation procedure can also be used for estimating the parameters in the 2PNO and 3PNO.

For the 1PLM, the parameter estimates obtained using CML and MML are usually quite close. The MML estimates may be biased if the assumption about the ability distribution is grossly violated, but this rarely happens. Besides, tests of model fit are available to detect these violations, this point will be returned to later. Table 7.5 gives the CML and MML estimates for an artificial data set generated using the 1PLM. The sample size was 1000 respondents, with a standard normal ability distribution. The generating values for the item parameters are given in the fourth column. The fifth and seventh column give the CML and MML estimates, respectively. It can be seen that they are both close, and both close to the generating values. The standard errors of the estimates of the estimates are given in the sixth and eighth column. It can be verified that the generating values of the item parameters are well within the confidence regions of the estimates.

The table also gives some classical test theory indices. Cronbach’s Alfa, which gives an indication of the overall reliability of the test, is given at the bottom of the table. The

columns labeled p-value and rit give the proportion correct item scores and the item-test correlation, respectively. Note that the item-test correlation, which gives an indication of the contribution of the item to the overall reliability, is highest for the items with a p- value closest to 0.50. Below it will be shown that this phenomenon is in line with conclusions from IRT. Finally, the table gives some indices of model fit, which will be discussed further below.

Table 7.5 CML and MML Estimates Compared.

true MML CML Tests of model fit

item p-

value rit b b Se(b) b Se(b) LM df p

1 .857 .369 −2.0 -2.084 .102 −2.002 .091 1.155 4 .885 2 .796 .418 −1.5 −1.600 .090 −1.519 .080 8.929 4 .063 3 .703 .465 −1.0 −1.022 .081 −.944 .072 2.164 5 .826 4 .614 .462 −0.5 −.554 .077 −.478 .068 5.138 5 .399 5 .512 .559 0.0 −.058 .075 .017 .067 5.322 5 .378 6 .530 .523 0.0 −.144 .076 −.069 .067 2.460 5 .783 7 .403 .522 0.5 .467 .077 .544 .068 1.985 5 .851 8 .287 .490 1.0 1.078 .082 1.157 .073 .978 5 .964 9 .252 .444 1.5 1.285 .085 1.365 .076 6.993 5 .221 10 .171 .402 2.0 1.847 .096 1.928 .086 6.051 5 .301 Alpha=.605 LM =22.161, df=27, p=.729

The choice between the two estimation procedures depends on various considerations.

CML estimation has the great advantage that no assumptions are made about the distribution of ability. In the MML procedure, the assumption about the distribution of the ability is an integral part of the model, and a potential threat to model fit. In the section on model fit, a test for the appropriateness of the ability distribution will be described. On the other hand, MML estimation is useful when inferences about the ability distribution are the very purpose of the analyses. Examples of this will be given in the next chapter. Further, MML estimation is more general because it also applies to the 2PLM, the 3PLM, the 2PNO and the 3PNO. The major drawback of the Rasch model is that, in many instances, the model is too restrictive to fit the data, especially the assumption of identical discrimination indices for all items often finds little support. As compromise between the tractable mathematical properties of the Rasch model with and the flexibility of the 2PLM, Verhelst and Glas (1995) propose the so-called One Parameter Logistic Model (OPLM). In OPLM, difficulty parameters are estimated and discrimination indices are imputed as known constants. Therefore, the weighted sum score is a sufficient statistic for the ability parameter, and the CML estimation method as given in Formula (13) can still be used with the definition of the sufficient statistic ri as in

Formula (8). In this way, the major advantage of CML estimation that no assumptions are made with respect to the ability distribution is preserved. In addition, Verhelst and Glas (1995) present well founded methods for formulating and testing hypotheses with respect to the magnitude of the discrimination indices. An example will be given below.

Bayesian estimation procedures

In the previous section, several maximum likelihood procedures for estimation of the parameters in an IRT model have been presented. These procedures belong to the realm of the classical so-called frequentist approach to statistical inference. This section considers an alternative approach, the so-called Bayesian approach. The motivations for the Bayesian approach are diverse. A rather mundane argument is that Bayesian confidence intervals are sometimes more realistic than frequentist confidence intervals.

Another, more philosophical, argument has to do with the foundations of statistics. In frequentist approach a probability is the relative frequency of occurrence of some event in experiments repeated under exactly the same circumstances, while the Bayesian approach views probability also as a measure of subjective uncertainty. These philosophical matters, however, do not play a prominent role in the Bayesian approach to estimation in IRT, so they are beyond the scope of this chapter. There are two motives for the adoption of Bayesian approaches to IRT. The first motive has to do with the fact that item parameter estimates in the 2PLM and 3PLM are sometimes hard to obtain, because the parameters are poorly determined by the available data. This occurs because in the region of the ability scale where the respondents are located, the item response curves can be appropriately described by a large number of sets of item parameter values. To obtain

“reasonable” and finite estimates, Mislevy (1986) considers a number of Bayesian approaches, entailing the introduction of prior distributions on the parameters. The approach is known under the general label of Bays modal estimation. The second motive has to do with the possibility of a Markov chain Monte Carlo (MCMC) algorithm for making Bayesian inferences. As will become clear in the sequel, more advanced IRT models give rise to complex dependency structures which require the evaluation of multiple integrals to solve the estimation equations in an MML or a Bayes modal framework. In the sequel, it will become clear that these problems are easily avoided in an MCMC framework.

In Bayesian inference, not only the data, but also the parameters are viewed as realizations of stochastic variables. This means that also the parameters have a distribution. Prior distributions can be used to express some prior belief about the distribution of parameters. So in the 1PLM, p(b) and p(θ) may be the prior distributions of the item and student parameters, respectively. Bayesian inference focuses on the so- called posterior distribution, which is the distribution of the parameters given the data. So in the 1PLM, the posterior distribution of the parameters b and θ, given all response patterns, denoted by Y, is given by

In Bayes modal estimation (Mislevy, 1986) the main interest is in keeping the parameters from attaining extreme values by imposing priors. This can be done by two methods, in

the first method, the prior distribution is fixed, in the second approach, which is often labelled an empirical Bayes approach, the parameters of the prior distribution are estimated along with the other parameters. As in MML, the student parameters are integrated out of the likelihood. Further, point estimates are computed as the maximum of the posterior distribution (hence the name Bayes modal estimation, for more details, see, Mislevy, 1986).

The Bayes modal procedure essentially serves to keep the parameters from wandering off. However, the procedure still entails integrating out the ability parameters and for complex models this integration often becomes infeasible. To solve this problem, Albert (1992) proposed a Markov chain Monte Carlo (MCMC) procedure. In this procedure, a graphical representation of the posterior distribution of every parameter in the model is constructed by drawing from this distribution. This is done using the so-called Gibbs sampler (Gelfand & Smiths, 1990). To implement the Gibbs sampler, the parameter vector is divided in a number of components, and each successive component is sampled from its conditional distribution given sampled values for all other components. This sampling scheme is repeated until the sampled values form stable posterior distributions.

If we apply this to the 1PLM we could divide the parameters into two components, say the item and ability parameters, and the Gibbs sampler would than imply that we first sample from p(θ|b,Y) and then from p(b|θ,Y) and repeat these iterations until the chain has converged, that is, when the drawn values are relatively stable and the number of draws is sufficient to produce an acceptable graph of the posterior distribution. Starting points for the MCMC procedure can be provided by the Bayes modal estimates produced described above, and the procedure first needs a number of burn-in iterations to stabilize.

For more complicated models, MCMC procedures were developed by Johnson and Albert (1999) and Béguin and Glas (2001).