Internal Consistency Analysis of the Reasoning Subtest

List of Appendices

Chapter 4 Internal Consistency Analysis of the Postgraduate Data Postgraduate Data

4.4 Internal Consistency Analysis of the Reasoning Subtest

The Reasoning subtest was comprised of three sections: Logic (items 81-88), Diagram (items 89-96) and Analytic (items 97-112), a total of 32 items. One Logic item (item 84) was not analysed as there was a typing error, leaving 31 items. The following sections provide the results of the analysis of internal consistency for the Reasoning subtest.

4.4.1 Treatment of Missing Responses

As with the Verbal and the Quantitative subtests, it appears that the three different missing treatments did not result in significant differences in the reliability index and the means of the item fit residuals. The largest difference in PSIs was 0.007 and that of mean item fit residual was approximately 0.02 (Table 4.19).

Table 4.19. The Effect of Different Treatments of Missing Responses in the Reasoning Subtest

Statistics A (Incorrect) B ( Missing) C (Mixed)

PSI 0.735 0.728 0.732

Mean item fit residual 0.189 0.213 0.196

Considering that the three missing response treatments for Reasoning had no significant effects, as with the Verbal and the Quantitative subtests, missing responses in the Reasoning data were also scored as incorrect responses.

4.4.2 Item Difficulty Order

The positions of items as presented in the test booklet in each section and their difficulty according to the item bank and according to the postgraduate analysis are presented in Figure 4.35. The first section is Logic followed by Diagram and Analytic. In each section, items were arranged according to their difficulty from the item bank (top panel). The correlation ranges from 0.90 to 1.00.

The difficulty estimates from the examinees responses (bottom panel) is not as consistent as their order in the test booklet. However, in general there is a trend that the earlier items are easier than the later items. The correlations between item order and item estimates in Logic, Diagram and Analytic were 0.96, 0.76, and 0.88 respectively.

It is clear that the ordering of the Reasoning items was highly consistent with the item bank difficulties and fairly consistent with the difficulty estimates from the examinees responses. Compared to the Verbal and Quantitative subtests the consistence is substantially higher. This suggests that the difficulties of the item bank are more stable in the Reasoning subtest than in the Verbal and Quantitative subtests.

Figure 4.35. Reasoning item order according to item location from the item bank (top panel) and from postgraduate analysis (bottom panel)

4.4.3 Targeting and Reliability

Figure 4.36 shows that the Reasoning items were relatively well distributed along the continuum, although in certain regions, especially between 0.0 and 0.4, no item represented the locations. The item locations ranged from -2.6 to 2.4 with a mean of 0.0 (fixed) and a standard deviation of 1.338. Person locations ranged from -2.788 to 2.677 with a mean of -0.203 and a standard deviation of 0.860.

The probability of success of persons located at -0.203 logits (the mean of person ability) on an item difficulty of 0.0 (the mean of item difficulty) is 0.45. In general, the Reasoning subtest had moderate to hard difficulty, with the items not too easy and not too difficult. The items were targeted relative to the applicant group reasonably well.

With a PSI of 0.735, the items separated persons well and had reasonable power to detect misfit.

Figure 4.36. Person-item location distribution of the Reasoning subtest 4.4.4 Item Fit

Except for item 96 with a fit residual of 2.753 and χ²probability of 0.0071 (greater than Bonferroni-adjusted probability of 0.000323 for item individual level, N = 31 with p = 0.01), there was no other item with a fit residual greater than 2.5 or lower than -2.5.

Also, there was no item that had a χ² probability < 0.01. Excluding item 96, the fit residuals ranged from -1.914 to 2.197 with p > 0.05. Thus, the Reasoning items in general fitted the model better compared to the Verbal and Quantitative items. Fit statistics for all Reasoning items are presented in Appendix C1.

4.4.5 Local Independence

As with the Verbal and Quantitative subtests, the results relating to local dependence are reported in two separate sections.

(a) Examining Local Dependence of All Items

There was no indication of local dependence among the Reasoning items. All the residual correlations between the items were lower than 0.20. Also, as indicated earlier, there were no items with high discrimination.

(b) Examining Local Dependence from an a Priori Structure

As with the Verbal and Quantitative subtests, the Reasoning subtest had separate sections that could lead to local dependence within sections of items. It consisted of three sections in which all items were originally analysed together, as if they were all in one section. A testlet analysis was performed to examine whether the structure resulted in local dependence between items. Three testlets were formed, one for each section, and their spread values are presented in Table 4.20.

Table 4.20. Spread Value and the Minimum Value Indicating Dependence in the Reasoning Subtest

Testlet

Range of Item Locations

No of Items

Minimum Value

Spread Value (

θ

)

Dependence Confirmed

Logic 3.54 7 0.25 0.30 No

Diagram 3.18 8 0.22 0.21 Yes

Analytic 4.32 16 0.12 0.16 No

Note. The minimum value provided in Andrich (1985b) is only up to 8 items. The value for number of items greater than 8, in this case 16, was calculated following Andrich (1985b).

Only the Diagram testlet had smaller spread value than the minimum value, although the difference was very small (0.01). The magnitude of the range of item difficulties in the Diagram section was relatively large (3.18); but compared to the other two testlets, the range was smaller. This indicates that the relatively greater range of item locations in the Logic and Analytic testlets perhaps contribute to their not showing evidence of dependence.

Again, as with the Verbal and Quantitative subtests, to obtain supporting evidence that dependence not only occurs in Diagram but also in the other two sections (Logic and Analytic) the reliability indices from three analyses were compared. The results are presented in Table 4.21.

Table 4.21. PSIs in Three Analyses to Confirm Dependence in Three Reasoning Testlets

Analysis Analysed Items PSI

1 31 dichotomous items 0.735

2 31 items forming 3 testlets 0.657

3 25 items of Logic and Analytic as dichotomous items and 8 items forming the Diagram testlet

0.709

As expected, the decrease in the reliability index was greater (0.078) when the items were analyzed in their testlets than when only Diagram items were formed into a testlet (0.026). This shows that dependence not only occurs in Diagram, but also in the Logic and Analytic sections.

Two other indicators also suggest that there is dependence due to the structure of the Reasoning subtest. The variance of person estimates decreased from 0.860 when the all items were analysed dichotomously, to 0.669 when the items forming three testlets were analysed. The total chi-square probability increased from p = 0.00 in dichotomous

analysis to p = 0.666 when the items forming three testlets analysis were analysed, indicating that the testlet analysis accounted for local dependence.

Nevertheless, based on the reduction of the PSI, as in the Verbal and Quantitative subtests, it was concluded that the magnitude of dependence was relatively small.

Therefore, the effect on the precision of measurement of not taking dependence into account, by analysing the items as dichotomous, should be small.

4.4.6 Evidence of Guessing

The procedure and the results of examining the evidence of guessing in the Reasoning subtest are reported in the same section headings as the Verbal and Quantitative subtests.

(a) Examining Graphical Evidence

The pattern of guessing seems to appear in four Reasoning items (96, 108, 109 and 112). As shown in Figure 4.37, in all ICCs, the observed mean of the lower class intervals is higher than expected and that of higher class intervals is lower than expected. These are four of the ten most difficult items, with three of them the last items in the section.

Figure 4.37. ICCs of four Reasoning items indicating guessing graphically

Dalam dokumen Evaluation of the Indonesian Scholastic Aptitude Test According to the Rasch Model and Its Paradigm (Halaman 193-200)