Internal Consistency Analysis of the Quantitative Subtest

List of Appendices

Chapter 4 Internal Consistency Analysis of the Postgraduate Data Postgraduate Data

4.3 Internal Consistency Analysis of the Quantitative Subtest

The Quantitative subtest consists of 3 Sections: Number Sequence (items 51–60), Arithmetic and Algebra Concepts (items 61–70), and Geometry (items 71–80). As indicated earlier, of 30 Quantitative items administered, only 29 items were analysed with the removal of one item (item 61) due to a typing error. The analysis of each aspect related to the internal consistency of the Quantitative subtest is presented in the following sections.

4.3.1 Treatment of Missing Responses

As with the Verbal subtest, it appears that the way missing responses were scored did not have a significant impact on the reliability and fit. The differences in the reliability indices and in the means of item fit residuals were 0.005 and 0.06 respectively, showing the difference is not significant. The details of the statistic under each treatment are presented in Table 4.11.

Table 4.11. The Effect of Different Treatments of Missing Responses in the Quantitative Subtest

Statistics A (incorrect) B (missing) C (mixed)

PSI 0.723 0.728 0.723

Mean item fit residual 0.114 0.171 0.114

Accordingly, as with the Verbal subtest, the analysis of the Quantitative items was based on the data in which missing responses were treated as incorrect responses.

4.3.2 Item Difficulty Order

The ordering of the items in the test booklet, their difficulty according to the item bank and their difficulty from the postgraduate examinees responses in each section are presented in Figure 4.21.

In the first section (Number Sequence), there is a high consistency between item order as presented in the test booklet and their difficulty according to the item bank (top panel). The items presented in the beginning of the section are the easier ones. The high consistency between item order and item bank difficulty is also shown by the correlation of 0.99 between the item order and item bank difficulty. However, the difficulty of the items from the examinees responses were not in the same order as presented in the test booklet, although the trend observed is that the later items were more difficult than the earlier ones. It is also shown by a correlation of 0.77 between the item estimates and the item order in the test booklet.

Figure 4.21. Item order of the Quantitative subtest according to the location from the item bank (top) and from the postgraduate analysis (bottom)

In the second (Arithmetic and Algebra) and the third (Geometry) sections, it appears that the items presented in the test booklet were not in the order of difficulty according to the item bank. This is because in these sections, as was mentioned in Chapter 3, the items were arranged according to difficulty for the same type of question. Therefore, even in the same section, the earlier items were not necessarily the easier items. To provide a clear picture, the item number, its difficulty according to the item bank, and its difficulty estimate from the examinees responses are presented in Table 4.12.

Table 4.12. Item Difficulty Order in the Quantitative Subtest

Section/Sub-section Item

No.

Item Bank Location

Item Estimate

Correlation Item Order-Item Bank Value

Correlation Item Order-Item Estimate

1. Number Sequence 51 -1.84 -2.008 0.99 0.77

52 -1.34 -1.602

53 -0.70 0.336

54 0.22 -0.298

55 0.06 0.443

56 0.87 0.234

57 1.15 0.529

58 1.29 0.880

59 1.62 0.468

60 1.71 0.418

2. Arithmetic and Algebra

a. Computation 62 0.96 0.094

b. Arithmetic problem 65 -1.25 0.139 0.60 0.0

66 0.81 0.097

67 0.20 -0.762

69 -0.47 -1.536

70 0.97 0.435

c. Algebra 63 -0.10 -0.022 1 0.5

64 0.87 -0.338

68 1.05 0.538

3. Geometry

a. Plane geometry1 71 0.29 0.145 1 0.5

72 0.33 -0.643

73 1.07 0.189

b. Solid geometry1 74 0.19 -0.477 1 1

75 0.41 0.713

c. Plane geometry2 76 -0.74 -0.629 1 1

77 1.06 0.884

d. Solid geometry2 78 -0.01 0.285 1 1

79 0.96 0.348

80 1.43 1.140

Mean 0.38 0.000

SD 0.91 0.755

The table shows that except, for Arithmetic, the correlation between the item order and item bank value in all sections was 1.0. The lower correlation in Arithmetic is

understandable because in this section some different operations or type of questions were included which makes it difficult to arrange items in order of their difficulty.

With regard to the correlation between the item order and the item estimates from postgraduate examinees’ responses, for Arithmetic, the correlation was 0.0, indicating that most of the items were not in the same order. In other sections/subsections, except in Algebra and Plane Geometry I, all items show the same difficulty and sequence order. In Algebra and Plane Geometry 1, with each consisting of three items, the lower correlation of 0.5 was attributable to the inconsistency of one item. Thus, except in Arithmetic, the ordering of the Quantitative items was relatively consistent with their difficulties from the item bank and with their estimates from the postgraduate analysis.

As with the Verbal Subtest, the result from the Quantitative subtest suggests that the examinees engagement with the items resulted in valid data.

4.3.3 Targeting and Reliability

Figure 4.22 presents the item and person distributions of the Quantitative subtest. It appears the item locations were further to the right of the continuum than were the person locations. Specifically, most of the items were located between 0.0 and 0.5 logits while most of the persons were located between -2.0 to 0.0 logits. This indicates that the items were difficult for the examinees.

The item locations ranged from -2.008 to 1.140 with a mean of 0.0 (fixed) and a standard deviation of 0.742. The person locations ranged from -3.265 to 3.959 with a mean of -0.801 and a standard deviation of 0.865. The probability of success for a person with an ability of -0.801 (the mean person ability) on an item with a difficulty of 0.0 (the mean item difficulty) was 0.31, substantially smaller than 0.5. Again, this indicates that the subtest was difficult for the examinees. To be able to select higher ability applicants, a difficult subtest seems appropriate. However, in examining the

distribution, it appears that the test seemed well targeted for only about one third of the applicants. If the selection ratio was not very small or the number of applicants to be accepted greater than one third, more items located between 1.0 and 2.5 and between -2.0 to -1.0 would be necessary to represent every region of the continuum.

Nevertheless, the PSI of 0.723 indicates that the Quantitative items separated persons relatively well and had a good power to disclose misfit. Increasing the number of somewhat easier items would increase this index to a higher value.

Figure 4.22. Person-item location distribution of the Quantitative subtest 4.3.4 Item Fit

The item fit residuals for the Quantitative subtest ranged from -2.333 to 5.022.

However, the extreme fit residual value of 5.022 was attributable only to one item;

namely, item 74. By excluding item 74, the range decreased and was -2.333 to 2.128.

Item 74 was the only misfitting item in the Quantitative subtest (fit statistics for all items are presented in Appendix B1). No other item showed a fit residual below -2.5 or above +2.5. Also, no other item had a χ² probability lower than 0.000345 (the Bonferroni-adjusted probability, with N = 29 for an individual item level of p = 0.01).

The misfit of item 74 was very evident as the fit residual value was extremely large

compared to the rest of the Quantitative items, while the χ²probability was lower than the Bonferroni-adjusted probability. The ICC of item 74 (Figure 4.23) also confirmed its extremely low discrimination.

Figure 4.23. The ICC of item 74

As shown in Figure 4.23, persons in the lower class intervals performed much better than expected according to the model, while persons in the higher class intervals performed worse than expected. The observed mean in most class intervals was almost the same. This suggests not only very low discrimination, but also that persons may have guessed on this item. Further examination of this item is reported in the guessing section.

4.3.5 Local Independence

As indicated earlier, with regards to local dependence, the analysis was carried out with two purposes. Firstly, is to examine the possibility of local dependence between items in the set. Secondly, is to examine the possibility of local dependence due to a priori dependent structure.

(a) Examining Local Dependence of All Items

In the previous section, it was reported that there was no Quantitative item with high discrimination. One of the possible reasons for high discriminating items is local dependence. An examination of the residual correlation between items, another possible indication of local dependence, also suggests a similar result. The residual correlations between items were lower than 0.20 and thus did not indicate local dependence.

Therefore, no testlet analysis was performed.

(b) Examining Dependence from an a Priori Structure

Because the Quantitative subtest, like the Verbal subtest, contains subsections of items, a testlet analysis was done to examine the effect of an a priori dependent structure within the sections. Table 4.13 presents the testlets formed and their spread values (θ).

The table shows that in Algebra and Geometry, the spread values were smaller than the minimum values. In Number Sequence, the values were the same and in Arithmetic, the spread value was greater than the minimum value. However, in terms of the magnitude, except in Algebra where the difference was 0.11, the difference between the spread value and the minimum value was very small in the other three testlets.

Table 4.13. Spread Value and the Minimum Value in the Quantitative Subtest

Testlet

Range of Item Locations

No of Items

Minimum Value

Spread

Value (

θ

⁾ Dependence Confirmed

Number Seq. 3.56 10 0.22 0.22 No

Arithmetic 2.22 6 0.29 0.31 No

Algebra 1.15 3 0.55 0.44 Yes

Geometry 1.78 10 0.22 0.19 Yes

Note. The minimum value provided in Andrich (1985b) is only up to 8 items. The value for number of items greater than 8, in this case 10, was calculated following Andrich (1985b)

The range of item difficulties in the first two testlets (Number Sequence and Arithmetic), where dependence was not confirmed, was greater than in the testlets where dependence was confirmed (Algebra and Geometry). This pattern was also

shown earlier in the Verbal subtest. Thus, dependence may have been present but the substantial difference in the range of item difficulties within a testlet increased the spread value (θ).

To obtain additional evidence to confirm that dependence not only occurs in two testlets (Algebra and Geometry) but also in the Number Sequence and Arithmetic testlests, the reliability indices from three analyses were compared. In the first analysis all 29 items were treated as dichotomous items (original analysis); in the second all items are analysed in testlets, forming four testlets as shown in Table 4.13; and in the third analysis the 10 Number Sequence items and 6 Arithmetic items were treated as dichotomous and the remaining items (13) were formed into two testlets of Algebra (3 items) and Geometry (10 items). It is expected that if dependence also occurs in Number Sequence and Arithmetic, the decrease in the reliability index will be greater in the second analysis. The results are presented in Table 4.14.

Table 4.14. PSIs in Three Analyses to Confirm Dependence in Four Quantitative Testlets

Analysis Analysed Items PSI

1 29 dichotomous items 0.723

2 29 items forming 4 testlets 0.653

3 16 items of Number Sequence and Arithmetic as

dichotomous items and 13 items forming Algebra and Geometry testlets

0.707

The decrease was greater in the second analysis (0.070) than in the third analysis (0.016). This shows that dependence not only occurs in the Algebra and Geometry testlets, but also in the Number Sequence and Arithmetic testlets, although it was small and not observed in the spread values.

With regards to other indicators of dependence, as expected when dependence occurs, the variance of person estimates was greater in the dichotomous, 0.865, than in the polytomous analysis, 0.725.

In terms of fit, the total chi-square probability increased from p = 0.00 to p = 0.559.

This indicates an improved fit after taking dependence into account under the polytomous model.

All the results show that there was dependence between items in the Quantitative subtest due to the a priori dependence structure. However, this dependence was small in magnitude. Therefore, it is expected that when items are analysed without taking dependence into consideration, they still provide reasonably good measurement.

4.3.6 Evidence of Guessing

The results of examining evidence of guessing are presented in the same section headings as in the Verbal subtest. Because they are parallel analyses, only a summary of the results and interpretations are presented here.

(a) Examining Graphical Evidence

The class interval means relative to the ICCs in the Quantitative subtest show that persons may have guessed in four items (53, 59, 71, 74). Figure 4.24 presents the ICCs of these items. It shows that in all ICCs the observed proportion for lower class intervals was higher than expected while for higher class intervals, it was lower than expected.

The difficulty of the four items are all above the mean of the person locations (-0.801) indicating that they were relatively difficult for the examinees. However, in the case of item 74, its difficulty of -0.477 is just above the mean of the persons. This item was indicated earlier as the only item in the Quantitative subtest that had a large fit residual and a significantly low discrimination relative to its ICC.

Figure 4.24. ICCs of four Quantitative items indicating guessing graphically

(b) Examining Statistical Evidence

Statistical evidence of guessing was examined by comparing the locations between the tailored and anchored analyses. For the anchored analysis the mean of the estimates of five items (51, 67, 69, 72, and 76) which had the lowest level of difficulty, fitted the model and did not exhibit guessing on the ICC in the original analysis, had their mean anchored to the tailored analysis.

The graph of the item estimates from tailored and anchored analyses is shown in Figure 4.25. It shows that a difference between locations in the two analyses is observed in a number of items, not only in most difficult items.

Figure 4.25. The plot of tailored and anchored locations for the Quantitative subtest

The statistical significances of the differences between tailored and anchored item estimates is presented in Table 4.15. A significant difference in locations is observed in eight items (52, 53, 54, 59, 62, 71, 74 and 79) with the absolute magnitude ranging from 0.10 to 0.38. Except for item 52 which is easier in the tailored analysis, the other seven items all are more difficult. Four of these items (53, 59, 71 and 74) were identified

above from their ICCs as the items which potentially showed guessing. Thus, there are four items (52, 54, 62, 79) which did not indicate guessing from the ICC but displayed significant differences in location between the tailored and anchored analyses.

Table 4.15. Statistics of Some Quantitative Items after Tailoring Procedure

Item Loc

original Loc tailored

Loc anchored

SE tailored

SE anchored

d (tail- anc)

SE (d)^a stdz d^b >2.58 Tailored sample 51^c -2.008 -2.121 -2.086 0.122 0.115 -0.035 0.041 -0.859 425 52 -1.602 -1.783 -1.680 0.114 0.107 -0.103 0.039 -2.619 * 425

... ... ... ... ... ... ... ... ... ... ...

64 -0.338 -0.382 -0.416 0.108 0.105 0.034 0.025 1.345 395

74 -0.477 -0.271 -0.555 0.109 0.103 0.284 0.036 7.963 * 395 54 -0.298 -0.260 -0.376 0.109 0.105 0.116 0.029 3.965 * 395

... ... ... ... ... ... ... ... ... ... ...

73 0.189 0.232 0.111 0.130 0.112 0.121 0.066 1.833 276

71 0.145 0.274 0.067 0.125 0.111 0.207 0.057 3.601 * 316 62 0.094 0.286 0.016 0.125 0.110 0.270 0.059 4.548 * 316

... ... ... ... ... ... ... ... ... ... ...

70 0.435 0.383 0.357 0.140 0.117 0.026 0.077 0.338 240

79 0.348 0.451 0.270 0.134 0.115 0.181 0.069 2.631 * 276 53 0.336 0.478 0.258 0.135 0.115 0.220 0.071 3.111 * 276

... ... ... ... ... ... ... ... ... ... ...

75 0.713 0.709 0.635 0.160 0.125 0.074 0.100 0.741 195

59 0.468 0.767 0.390 0.149 0.118 0.377 0.091 4.144 * 240

58 0.880 0.808 0.802 0.176 0.131 0.006 0.118 0.051 157

77 0.884 0.919 0.806 0.198 0.131 0.113 0.148 0.761 121

80 1.140 1.258 1.062 0.218 0.141 0.196 0.166 1.179 111

Mean 0.000 0.000 -0.078 0.133 0.114 0.078 0.065 1.224

SD 0.755 0.802 0.755 0.027 0.009 0.116 0.035 2.176

Note. ^aStandard error of the difference, is SE_tail² −SE_anch² , ^sstandardized of the difference(z) is d / SE(d), ^c anchor items. The items in bold are those that showed significant difference between tailored and anchored estimates and showed evidence of guessing from ICC. The items in italics are those items that showed significant difference between tailored and anchored estimates but did not indicate guessing from their ICCs.

These four items (52, 54, 62, and 79) are now considered more closely. The differences between the anchored and tailored estimates in these items were relatively large. This is different from the Verbal subtest where the items which did not indicate guessing based on the ICC but showed a significant difference in locations, had a small magnitude of

location difference. As indicated earlier, in the Verbal subtest the difference was significant because of the small difference in the standard errors.

To study these items, Figure 4.26 shows the observed means in the class intervals relative to their ICCs from the original analysis. Item 52 is the second easiest item in the subtest and it is easier in the tailored analysis than in the anchored analysis. Item 52 was easier in the anchored analysis because unlike the case with guessing, the item shows over discrimination relative to the ICC. In the tailored analysis, persons with low proficiency have their responses turned into missing data. Therefore, their responses tended to make the item more difficult than if they follow the curve.

Other items showed different patterns. The ICCs of item 54, 62, 79, to some extent, show a guessing pattern with the closest observed in item 62. The guessing pattern in this item is shown in five of the six class intervals: the observed means for lower class intervals are greater than expected and those for two of the three higher class intervals, lower than expected. In item 54 the guessing pattern is shown in the two highest class intervals in which the observed mean is lower than expected. In item 79, the guessing pattern appears in the first and fifth class intervals. Because the pattern of guessing is not observed in all class intervals, these items were not initially identified as items with guessing. Nevertheless, the effect of tailoring in these items is similar to the items with guessing, that is, items become more difficult. Tailoring eliminates the responses of persons in the lower class intervals, that is students with lower proficiency, who answered correctly at a greater rate than the ICC curve, and retains relatively more responses of examinees in the higher class intervals who respond correctly at a smaller rate than the ICC. Therefore, the items appear more difficult in the tailored analysis.

Figure 4.26. The ICCs from original analysis for four Quantitative items which indicate significant location difference between tailored and anchored analyses but did not

indicate guessing from the ICC

(c) Confirming Guessing

An anchored all analysis, where all items are anchored to their tailored estimates and the whole data set reanalysed for fit, is performed to confirm guessing. The ICCs from the anchored all analysis is compared to that from the original analysis. Guessing is confirmed when the item appears to be difficult and the ICC is to the right of the original analysis, the observed proportion correct of the lower class intervals is further above the ICC, and that of higher class intervals is closer to the ICC. The ICCs from the original and anchored all analysis of four items (53, 59, 71, and 74) which indicate guessing from graphical and statistical criteria were compared, as shown in Figure 4.27.

As shown in Figure 4.27, all four items are more difficult in the anchored all analysis.

In terms of fit, the observed proportion correct in the lower class intervals were further above the ICC while those for the higher class intervals were closer to the ICC.

However, only in item 53 was the observed mean very close to the expected value in the anchored all analysis. In the other three items, the proportion correct for higher groups was still below the ICC. The evidence suggests that items 59, 71 and 74 under discriminate relative to the ICC and persons may have also guessed on these items.

Figure 4.27. ICCs of four Quantitative items from the original analysis (left) and anchored all analysis (right) to confirm guessing

(d) Content of Items Relative to Statistical Evidence

Figure 4.28 presents the content of items 53, 59, 71, and 74 which indicate guessing from the ICC and statistical criteria. Items 53 and 59 are Sequence Numbers. Both were moderately difficult in the Quantitative subtest. In item 53, the pattern involves the factors of a prime number. It is apparent that examinees who did not understand prime numbers would not see the pattern. This perhaps explains the incidence of guessing and under discrimination in this item.

For item 59, there are two patterns involved. The first is factors of 3, and the other is to multiply the number respectively by 2, 3, and 4. That there are two different patterns involved perhaps contributes to the difficulty of this item and the tendency of some examinees to guess and for the item to under discriminate relative to the ICC.

Item 71, which is of moderate difficulty, is a Geometry item. The item presents the square figure with a shaded area inside. Examinees were asked to calculate the shaded area. This item is straightforward and can be solved if the formulae for the area of a square and circle are known. It is not clear why this item indicated relative under discrimination and possible guessing. Compared to other pictorial geometric items, the stimulus and the problem in item 71 are less complicated. However, there are some parts of the figure which are not proportional. This may lead to confusion and hence lead examinees to guess and for the item to under discriminate relative to the ICC.

___________________________________________________________________

Instruction: Find a correct number to complete the sequence.

53. 8 12 21 46 95 216 ....

a. 465 b. 431 c. 385 * d. 375 e. 337

59. 3 4 6 8 15 24 42 96 ….

a. 106 b. 123*

c. 133 d. 152 e. 169

Instruction: Find a correct answer for these items

71. What is the area of the shaded part?

a. 399 cm² * b. 476 cm² c. 495 cm² d. 553 cm² e. 601 cm²

74. What is the volume of the figure?

a. 0,128 m³ b. 0,136 m³ c. 0,160 m³ * d. 0,180 m³ e. 0,192 m³

______________________________________________________________________

Figure 4.28. The Content of four Quantitative items indicate guessing

Item 74 is a Geometry item with the stimulus of a solid figure. Examinees were asked to calculate the volume of the figure. The problem is relatively simple and straightforward. However, the item can be difficult if the examinees do not recall the formula to calculate the volume for a cuboid. Thus this item seems to require little

7 cm

14 cm

28 cm

14 cm 7 cm 28 cm

20 cm 40 c

20 c m m

quantitative reasoning because examinees who recall and apply the formula directly may get the correct answer. Another possibility is that visualization play important role in this item. Examinees with limited visualization ability, including those from higher class intervals, may not able to answer this item correctly.

4.3.7 Distractor Analysis

In detecting item distractors with information, the same criteria and procedure described in the Verbal subtest were applied to the Quantitative subtest. Out of 29 items, 22 items showed potential to be rescored as polytomous items. These 22 items were rescored polytomously and reanalysed with the rest of the items which remained as dichotomous items. The results of rescoring these items are presented in Table 4.16. It shows that five items had thresholds in order, that is items 55, 58, 62, 77, and 80, but only three items (55, 58, and 80) showed a

θ

value significantly different from 0. Among those three, only items 55 and 58 indicate improvement in fit after rescoring. In contrast, the fit of item 80, as indicated by its χ² probability, was reduced after rescoring, although it still fitted the model. Although item 80 did not show an improvement in fit, the _θ^ˆ_z was relatively large (3.721), therefore, item 80 was also rescored along with items 55 and 58. The results of rescoring these three items are presented in Table 4.17.

Dalam dokumen Evaluation of the Indonesian Scholastic Aptitude Test According to the Rasch Model and Its Paradigm (Halaman 165-193)