• Tidak ada hasil yang ditemukan

Estimation and testing procedures

Measurement Models in Assessment and Evaluation

7.3 Models for Polytomous Items .1 Introduction

7.3.5 Estimation and testing procedures

Since continuation-ratio models can be viewed as models for dichotomous data obtained in a design with structural missing data, estimation and testing procedures directly follow from the procedures for the analogous IRT models. So the MML and MCMC procedures described above directly hold. An exception is CML estimation, which is not feasible for continuation-ratio models (Glas, 1998b). Both for cumulative-probabilities models and adjacent-category models MML and MCMC estimation procedures are feasible, but in practice, MCMC procedures are most practical for cumulative-probabilities models

(Johnson & Albert, 1999) and MML procedures are most practical for adjacent-category models (Glas & Verhelst, 1989). It is beyond the scope of this chapter to treat all estimation and testing procedures in detail, but to give a flavor of the methods, a likelihood-based estimation and testing procedures will be sketched for the partial credit model.

The theory for MML estimation presented above for dichotomous items can also be used for the PCM (Bock & Aitkin, 1981; Mislevy, 1984, 1986; Glas & Verhelst, 1995).

The probability of a student / scoring in category m on item k is given by Formula (24).

Using local independence, the probability of a student’s response pattern is given by

where yi is the response pattern of student i. Assuming independence between respondents, the likelihood given the entire data set is the product of these expressions, so the likelihood is analogous to the likelihood in the dichotomous case, given by Formula (10). As for the 1PLM, also here three maximum likelihood estimation procedures are available: joint maximum likelihood (JML), conditional maximum likelihood (CML) and marginal maximum likelihood (MML).

The JML estimation equations are straightforward generalization of the analogous equations for dichotomous items. They are given by

and

So the respondents’ sum scores ri and the number of respondents scoring in category m of item k are equated with their respective expected values. However, as in the dichotomous case, JML does not result in consistent estimators, because the number of student parameters goes to infinity as the sample size goes to infinity. To obtain such estimates we can use either CML or MML.

In CML estimation, a likelihood function of the item parameters given the observed sufficient statistics is maximized. This leads to the estimation equations

for k=1,…, k and j=1,…, mk. Here the numbers of respondents scoring in category m of item k are equated with their conditional expected values.

In MML estimation, the likelihood is marginalized with respect to the ability parameters under the assumption that they have a common normal distribution. This leads to the estimation equations

for k=1,…, k and j=1,…, mk, where the expectation is with respect to the posterior distribution of θ given the response pattern yi.

For solving the estimation equations, Bock and Aitkin (1981) employ the EM algorithm [expectation-maximization algorithm], where the unobserved values of θ are considered to be missing data. The term EM algorithm was introduced in Dempster, Laird and Rubin (1977). It is a general iterative algorithm for ML estimation in incomplete data problems. It handles missing data, firstly, by replacing missing values by a distribution of estimated values, secondly, by estimating new parameters, thirdly, by re- estimating the distribution of missing values assuming the new parameter estimates are correct, and fourth, re-estimate parameters, and so forth, iterating until convergence.

Evaluation of the fit of response curves

Generalization of the Q1 tests by Orlando and Thissen (2000) to polytomous data turns out to be infeasible. The reason is that these tests are evaluated using score groups. This leads to the problem that responses in high item categories are often unobserved for at low score levels and responses in low item categories are often unobserved for at high score levels. Therefore, the table on which the test is based usually has too many empty cells. The solution is to group score levels, which can be done using the framework of LM tests (Glas, 1999). As above, this LM test is denoted as LM-Q1. Also as above, the score range is partitioned into G subsets, and it is evaluated whether the observed and expected number of responses in the item categories conforms the model. An indicator function is defined that is equal to one if the sum score on the response pattern without item k falls in subrange g, and equal to zero if this is not the case. To simplify the notation, we will first reparameterize the PCM using a transformation of the item parameters . Then the alternative model on which the LM test is based, is given by

for m=1,…, Mk. Under the null model, which is the PCM model, the additional parameter δkmg is set equal to zero. Notice that parameter δkmg is different for each category m. In the alternative model, the additional parameter is a free parameter, δkmg≠0. For the LM-Q1 test, it can be shown that the test can be based on the differences

for k=1,…, K, j=1, Mk, and g=1,…,G. So the test is based on the difference between the observed number of responses in category m of item k of the respondents in subgroup g and its posterior expectation. This expected value is computed using the PCM without the additional parameters δ, so it is computed under the null model. If the difference between

the observed and expected values is large, this means that the PCM model did not fit the data and the additional parameters δkmg are necessary to obtain model fit.

Evaluation of local independence

Also local independence can be evaluated using the LM framework (Glas, 1999). As above, this LM test is denoted as LM-Q2. A possible dependency between the items k and item / is modeled as

Note the parameter δigjh models the association between the two items. The LM2 test is used to test the special model, δigjh=0, against the alternative model, δigjh≠0.

If the theory of the LM test is applied, it turns out that the test is based on the difference

for g=1,…, Mk and h= 1,…, Mj. So, this is the difference between the number of students with an observed response in category g of item j and an observed response in category h of item j with its posterior expected value. The expected value is computed using the null model with local independence as assumption. If the LM2 test is significant, the additional parameter is necessary to obtain model fit. The pair of items is locally dependent meaning that an answer on one item influences the answer on the other item.

Person fit

For the UB test, the complete response pattern is split up into a number of parts, say the parts g–0,…, G. Then it is evaluated whether the same ability parameter θ can account for the responses in all partial response patterns. Let Ag be the set of the indices of the items in part g. We pose the alternative model that this is not the case, that is, for g>0, we pose the model

One group g should be used as a reference. As was already shown above, an LM statistic can be defined as a quadratic form in the first-order derivatives with respect to θ0.

Analogously, an LM test for local independence can be based on a model where the response on item i depends on the response on item k. The model is given by

Note that δyk can be interpreted as a shift in ability that is proportional to the response level on item k. An LM statistic can be defined as a quadratic form in the first-order derivatives with respect to δ.