• Tidak ada hasil yang ditemukan

View of Discrimination index, difficulty index, and distractor efficiency in MCQs English for academic purposes midterm test

N/A
N/A
Protected

Academic year: 2024

Membagikan "View of Discrimination index, difficulty index, and distractor efficiency in MCQs English for academic purposes midterm test"

Copied!
11
0
0

Teks penuh

(1)

Available online: http://jurnal.ustjogja.ac.id/index.php/ELP

Discrimination index, difficulty index, and distractor efficiency in MCQs English for academic purposes

midterm test

Sri Rejeki 1 *, Angela Bayu Pertama Sari 1, Dwi Iswahyuni 1, Devita Widyaningtyas Yogyanti 1, Sutanto Sutanto 5, Helta Anggia 2

1 Universitas Bina Sarana Informatika. Jl. Kramat Raya No. 98, Senen, Jakarta Pusat, Indonesia

2 University of Szeged, H-6720 Szeged, Dugonics square 13, Hungary

* Corresponding Author. E-mail: [email protected] Received: 20 April 2023; Revised: 19 June 2023; Accepted: 30 June 2023

Abstract: Test item analysis in language learning plays a crucial role in ensuring that the test can be an effective assessment tool in the learning process. It is a quantitative study that aimed to analyze Multiple Choice Questions of reading sections in English for Academic Purposes (EAP) subject for non-English majors to find out the Discrimination Index (DI), Difficulty Index (DIF), and Distractor Efficiency (DE). The participants of this study were first-year students majoring in the civil engineering program at Universitas Muhammadiyah Surakarta. Researchers collected the primary data from the answer sheets of the reading section. The data analysis was conducted by using Microsoft Excel and uploaded to Anates. It quickly gave the result of all the calculations of DI, DIF, and DE. The findings revealed the DI and DIF of this test were not good because most of the results were poor DI and too easy DIF, which meant that most test items needed modifications and distractors. Most distractors used in this test were not functional and must be removed, changed, or modified.

Keywords: Anates, English for Academic Purposes, Item Analysis

How to Cite: Rejeki, S., Sari, A. B. P., Sutanto, S., Iswahyuni, D., Yogyanti, D. W., & Anggia, H. (2023). Discrimination index, difficulty index, and distractor efficiency in MCQs English for academic purposes midterm test. Journal of English Language and Pedagogy, 6(1), 1-11. https://doi.org/10.36597/jelp.v6i1.14738

Introduction

Many studies have pointed out the crucial roles of proper assessment in the success of EFL teaching and learning process (Maharani & Putro, 2020). Assessment can reveal the students’

progress in mastering the learning material that was taught by the teachers (Browder et al, 2006). Moreover, assessment helps the teacher to decide the proper approach and method of teaching (Scouller, 1998). One of the method used in assessing students’ proficiency is by using test. However, the fact shows that in some occasion, test in EFL learning does not work effectively in the process of assessment. Thus, item analysis is needed to be sought out further.

Item analysis is a method that investigate the test quality with the purpose to refine the well-constructed test items (Rosana & Setyawarno, 2017). Some studies mentioned the main goal of test item analysis that is to build on better test by revising and omitting the poor test items (Boopathiraj & Chellamani, 2013; Mukherjee & Lahiri, 2015).

Analysis of a test item means a set of processes to collect, summary, and use information from the result of the students’ test to check the quality of the test item, including multiple choice test (MCQ). In other words, item analysis is a procedure performed after a test is

(2)

constructed and administered, which provides feedback information on the reliability and validity of the test item (Considine, Botti, & Thomas, 2005; Khan, Ishrat & Khan, 2015 as cited in Matazu & Julius, 2021). Furthermore, performing an item analysis is a prominent aspect of maintaining the quality of MCQs (Sharma, 2021). Analysing test items is not based on the teacher's or examiner's perspectives but on the students as the examinee. The result of this item analysis is to find out the most straightforward or challenging item test that needs to be revised or dropped.

Generally, teachers get guidelines when constructing test items (Haladyna et al., 2002), but in fact, there are still some issues with it (Ashtiani & Babaii, 2007; Carroll & Coody, 2006; Marso

& Pigge, 1991 as cited in Mulyani et al., 2020) namely (1) teachers believe teacher-designed tests have a positive impact on teaching and learning, (2) most of the tests developed by teachers contained many errors, and (3) teachers typically do not use test improvement strategies such as item analysis or test blueprint. Therefore, it needs to conduct item analysis to improve to improve the test quality. This is in line with the previous research conducted by Satyendra (2021) which mentioned that to know the quality of a test, it needs reliability, difficulty, validity, and discriminating values of the test. In other perspective, if those aspects are good, it means that the quality of the text is also good..

There was some research on item analysis and its related topics. The first was conducted by Burud, Nagandla, and Agarwal (2019) on distractors. They employed summative tests done by 113 students, including 120 one-best answers (OBAs) and 360 distractors. It suggested that an increased sample size could improve the item’s evaluation and performance of the distractor.

Unfortunately, it did not analyse the effect of the blooms level of questions effects on distractor efficiency and discrimination index.

Another researcher looked at the quality of the test and tried to do an item analysis. Hartati and Yogi (2019) conducted a small-scale study to determine the quality of the teacher's summative tests. It resulted that 43 distractors needed revision to improve discriminating power and alter the difficulty level. It was also expected that teachers need to examine the test they made to ensure the quality of test items.

In 2021, Elgadal & Mariod studied item analysis under the topic assessment tool for quality assurance measures. They collected articles published since 2010 on item analysis and used Google Scholar and PubMed as the databases, they employed Classical test Theory (CTT) and Item Response Theory (IRT) and resulted that item analysis helps test makers to correct technical flaws for question bank construction.

Similar research on item analysis was also carried out by Sharma (2021). She conducted item analysis of MCQs of speech sounds in English. She analysed 20 MCQs regarding consonant and vowel sounds of English to first-year major English students in Nepal. It resulted that most MCQs were reliable and valid. It also suggested conducting item analyses to identify the areas of potential weakness in the formation of MCQ items to improve the standard of assessment of students’ understanding of subject matters.

Ismail & Zubairi (2022) conducted an item analysis of reading tests in Sri Lankan context by employing classical test theory. Fifty students did 25 test items based CEFR curriculum, precisely the B2 level. Using KR-20, the constructed test met the standards for content validity, although there were five malfunctioned distractors.

What has been done by previous researchers supported the researchers to conduct a similar study on item analysis of MCQs of reading section in English Academic Purposes subject for non-English majors to find out the Discrimination Index (DI), Difficulty Index (DIF), and Distractor Efficiency (DE). This study used MCQs because it is one of the most popular test

(3)

formats (Ganji & Esfandiari, 2020), a standard method (Kiss & Selei, 2017), and widely used in educational assessment (Gierl et al., 2017), including in Indonesia, to test the students in the midterm and final test.

Literature Review Reliability

Reliability is paramount in content analysis, and content analytic results are useless without their establishment (Neuendorf, 2002, as cited in O’Connor & Joffe, 2020). It is usually expressed by a single value that can range between zero dan 1 (Cunningham, 1998:33). The categories for reliability can be seen in the Table 1.

Table 1. Reliability Interpretation by OEA Reliability Interpretation

0.90 and above Excellent reliability, best standardized test 0.80-0.90 Very good for a classroom test

0.70-0.80 Good for a classroom test, few items could be improved

0.60-0.70 Somewhat low, need supplementation to grade, improved some items 0.50-0.60 Needs revision for test, needs supplementation

0.50 or below Questionable reliability, needs revision Discrimination Index (DI)

DI means an ability of a question to distinguish amongst pupils according to how well they comprehend the information being assessed. Hingorjo & Jaleel (2012) defined that a test item can achieve ideal discrimination index when high achievers students answer correctly more often than the low achiever students. Discrimination index is significant since if the discrimination index is poor, it will give bad effect on reliable interpretation of the real students’

learning proficiency (Setiyana, 2016). On the other hand, it could be said that DI is a metric to determine if a question distinguished students who understood the content well and those who did not. The range for DI is 0-1. There are four categories for DI (Mahjabeen et al, 2018).

Those are poor (≤0.2), acceptable (b/w 0.21-0.24), good (b/w 0.25-0.35), and excellent (≥0.36).

The result of DI could be interpreted in the Table 2.

Table 2. Interpretation of Discrimination Index (DI)

No DI Evaluations Recommendation

1 Negative Worst/defective item Definitely discard

2 <0.20 Not discriminating item Revise / discard

3 0.20-0.29 Moderately discriminating, fair item Keep

4 0.30-0.39 Discriminating item, good item Keep

5 ≥0.40 Very good item, very discriminating Keep Difficulty Index (DIF)

The difficulty index, sometimes called item difficulty, determines if a question is too simple or too complicated (Brown, 2004). Haladyana (2004) added that difficulty index is the index for identifying the percentage of students who make correct answers. The optimal item difficulty level for four multiple-choice is 0.63 (uwosh.com). There are four categorizations for DIF, and those are too easy (>70%), average (b/w 30-70%), good (b/w 50-60%), and too difficult (<30%) (Mahjabeen et al, 2018; Flucher & Davidson, 2007). In some situation, teachers constructed unclear language structures and used unfamiliar vocabularies into the test items that affected the difficulty index and discrimination index (Pradanti et al., 2018).

(4)

Distractor Efficiency (DE)

Distractor efficiency refers to determining whether distractors tend to be chosen by less able students and not by the more able students. Distractors in MCQs have a significant impact on their efficacy. Teachers must track how many students choose each distractor and change the ones that get little to no attention. The distractor is called a functional distractor (FD) if the number of students who choose it is >5%. On the other hand, it is called a non-functional distractor (NFD) if the number of students who select it is <5% (Mahjabeen et al, 2018).

Methods

The following description briefly describes the research design, research site, participants, data collection, and data analysis.

Research design

Researchers collected the primary data from the answer sheets of the reading section in the mid-term test of EAP subject to get the answers from the students to find out the number of items that they answered correctly. They provided the answer that they chose to find out Differentiation Index (DI), Difficulty Index (DIF), and Distractor Efficiency (DE). After the result is out from the software, it was presented numerically and then described to explain the result of the data.

Research site and participants

The participants of this study were first-year students majoring in the civil engineering program at Universitas Muhammadiyah Surakarta. Research site means the location what kind of development does the owner have or maybe other all. (Forty) students were enrolled in English for Academic Purposes (EAP), where this research was carried out, and they were around 19-20 years old at the time of data collection. There were ten female and thirty male students.

Data Collection

The data used in this study were the result of the participants' mid-term test, specifically in the reading section. There were 35 reading items in the form of multiple choice questions (MCQs) with four provided choices as the answers. First, the participants participated in the midterm test. They answered the MCQs for the reading section by selecting the best answer from four provided options, with precisely one correct answer and three distractors (incorrect answers). The maximum score was 35 points. Afterward, the researchers collected and checked the answers and then transferred them into Excel form. Next, the researchers entered the participants' initial and value for the items that participants answered to Anates 4.0. The value was one point for a question answered correctly and zero for a wrong answer. After all the data had been entered into Anates 4.10, they proceeded to find out data to find Differentiation Index (DI), Difficulty Index (DIF), and Distractor Efficiency (DE). Once the data were processed, they were coded to help the researchers classify them.

Data Analysis

After the data were inputted in Microsoft Excel and uploaded to Anates, it gave the result of all the calculations of DI, DIF, and DE in a quick process. After that, the result of the data was classified and identified based on its class (DI, DIF, and DE). The data were sorted from the highest score to the lowest score. After each data were separated, it was coded by colouring

(5)

them to help the researchers group the result of data in their class. After classifying and grouping the result of the data, the next stage was analysing the result of the data and describing it.

Results and Discussion Results

Forty students participated as respondents in this research, and each student had a mid- term test consisting of all skills, including 35 reading items with MCQs form. Each 35-reading item has one point, so the total is 35 points. Each reading test item has three distractors; in total, there were 105 distractors.

Table 3. Characteristics of MCQs reading test part of midterm test

n (students) 40

N (MCQ items) 35

Score total 35

Reliability 0.63

XY Correlation 0.45

Mean signed difference (MSD) 2.47

Mean 27.53

Regarding DI, there are upper and lower classes, with each course (n) being 11.

Table 4. DI Calculation Result

Item No. Upper Lower Diff DI (%)

1. 8 9 -1 -9.09

2. 11 8 3 27.27

3. 6 3 3 27.27

4. 11 11 0 0.00

5. 7 8 -1 -9.09

6. 9 1 8 72.73

7. 10 6 4 36.36

8. 9 5 4 36.36

9. 11 7 4 36.36

10. 11 10 1 9.09

11. 11 10 1 9.09

12. 11 9 2 18.18

13. 11 10 1 9.09

14. 11 11 0 0.00

15. 9 5 4 36.36

16. 10 8 2 18.18

17. 5 5 0 0.00

18. 10 4 6 54.55

19. 6 2 4 36.36

20. 11 5 6 54.55

21. 10 10 0 0.00

22. 6 1 5 45.45

23. 2 4 -2 -18.18

24. 11 11 0 0.00

25. 10 4 6 54.55

26. 11 11 0 0.00

27. 11 7 4 36.36

(6)

Item No. Upper Lower Diff DI (%)

28. 10 9 1 9.09

29. 11 11 0 0.00

30. 11 11 0 0.00

31. 10 9 1 9.09

32. 10 11 -1 -9.09

33. 11 11 0 0.00

34. 11 11 0 0.00

35. 11 11 0 0.00

From the presented result, it can be seen that there are three types of results, which are negative, neutral, and positive. Then, all the results of DI were analysed into four categories, namely poor, acceptable, good, and excellent. It is described in the Table 5.

Table 5. DI Categorization No. Categorization Item No.

1. Poor

≤0.2

1,4,5,10,11,12,13,14,16,17,21,23,24,26,28,29,30,31,32,33,34,35 2. Acceptable

0.21-0.24

-

3. Good

0.25-0.35

2,3 4. Excellent

≥0.36

6,7,8,9,15,18,19,20,22,25,27

From the four classifications of DI, many items are classified as poor DI, with a total item are 22 items. The result also showed that no item is classified as acceptable DI, but two items are classified as good DI and 11 items as excellent DI.

Besides that, it is also classified into five classes related to recommendations for the DI’s item tests. The calculation result showed that there are four items classified as negative, which means it has the worst defective item and must be discarded from the list of questions. In addition, 18 items are classified as not discriminating items, meaning it needs revision if it is still used in the questions or if they may be discarded. The result also described that two fair items have moderately discriminating and must be kept for the test. Moreover, there are six good items and five very good items that need to be kept for the test. A full description of the classification is described in the Table 6.

Table 6. DI Classification

No. DI Item No. f

1. Negative 1,5,23,32 4

2. <0.20 4,10,11,12,13,14,16,17,21,24,26,28,29,30,31,33,34,35 18

3. 0.20-0.29 2,3 2

4. 0.30-0.39 7,8,9,15,19,27 6

5. ≥0.40 6,18,2,22,25 5

Total 35

The analysis also resulted in the DIF calculation for these 35 reading test items. The range of DIF calculation is from 1 to 100. The result of the DIF calculation is described in Table 7.

Table 7. DIF Calculation Result

Item No. DIF (%) Item No DIF (%)

(7)

1. 82.50 19 37.50

2. 87.50 20 82.50

3. 30.00 21 95.00

4. 97.50 22 45.00

5. 57.50 23 17.50

6. 45.00 24 100.00

7. 75.00 25 72.50

8. 52.50 26 100.00

9. 85.00 27 90.00

10. 92.50 28 92.50

11. 95.00 29 97.5

12. 90.00 30 100.00

13. 97.50 31 92.50

14. 97.50 32 95.00

15. 57.50 33 100.00

16. 82.50 34 100.00

17. 32.50 35 100.00

18. 77.50

Furthermore, the result of the DIF calculation was grouped into DIF classifications: too easy, average, good, and too difficult. The classification of DIF results into its class is presented in the Table 8.

Table 8. DIF Classification No. Classification Item No.

1. Too easy 1,2,4,7,9,10,11,12,13,14,16,18,20,21,24,25,26,27,28,29,30,31,32,33,34,35 2. Average 3,5,6,8,17,19,22

3. Good 5,8,15

4. Too difficult 3,23

After that, the calculation for the efficiency of the distractors of test items was also found.

There are 105 distractors in total and 35 correct answers. The result of the DE calculation is presented in Table 9.

Table 9. DE Calculation Result

Item No. Options

A B C D

1. 2++ 33** 0-- 5---

2. 0-- 1+ 35** 4---

3. 26--- 1-- 1-- 12**

4. 39** 0-- 0-- 1---

5. 23** 1-- 16--- 0--

6. 18** 6++ 1-- 15---

7. 0-- 30** 8--- 1-

8. 10- 21** 9+ 0--

9. 3+ 0-- 3+ 34**

10. 0-- 3--- 37** 0--

11. 0-- 2--- 0-- 38**

12. 0-- 3--- 36** 1+

13. 0-- 39** 1--- 0--

14. 39** 1--- 0-- 0--

15. 2- 9- 23** 5++

16. 33** 3+ 3+ 1-

(8)

Item No. Options

A B C D

17. 4- 7++ 13** 16--

18. 1- 31** 5- 3++

19. 13- 11+ 1-- 15**

20. 2++ 1- 4- 33**

21. 1+ 38** 1+ 0--

22. 22--- 18** 0-- 0--

23. 32--- 7** 0-- 1--

24. 0 0 0 40**

25. 29** 0-- 8--- 3++

26. 40** 0 0 0

27. 3--- 0-- 1+ 36**

28. 1++ 0-- 37** 1++

29. 0-- 0-- 39** 1---

30. 40** 0 0 0

31. 2-- 0-- 1++ 37**

32. 0-- 38** 2--- 0--

33. 40** 0 0 0

34. 0 0 40** 0

35. 0 0 0 40**

Note: **= correct answer, ++ = very good, + = good, - = less, -- = poor, --- = very poor

Those DE results were then classified into four groups, with a total of 10 for very good distractors, 11 for good distractors, 11 for fewer distractors, 37 for poor distractors, and 18 for very poor distractors. In addition, there are 18 distractors which do not select by any students, and it is classified as very poor distractors.

Table 10. DE Classification

No. Criteria F

1. Very good 10

2. Good 11

3. Less 11

4. Poor 37

5. Very poor 18

When DE results are presented in the classification between FD and NFD, the differences are pretty significant. It is found that from 105 distractors, there are 90 NFD distractors and 15 FD.

Figure 1. FD and NFD of DE Discussion

85.7 14.3

NFD FD

(9)

MCQs are one of the tests given to students with more than one distractor and one correct answer and are usually one of the methods to assess the students' cognitive knowledge. MCQs, including teachers-made, are needed for evaluation to make them valid MCQs.

The reliability of the test in this study is 0.63, which means it is somewhat low and needs to improve by providing a supplement to grade it. The validity of the test is 0.45, which is quite good because it is in average classification, between low and high.

In addition, the DI of this test shows how well the students comprehend the materials they learned. The reading test items on this EAP midterm test need to redesign because only about 13 items, or 37%, have good DI, and more than 62% or 22 items of the test items have poor DI. It means that 22 reading items of the tests need revisions if they are still used in the test, and only 13 items are good to be used without revision or change.

Furthermore, a test would be categorized as good if it consists of 25% difficult items, 25%

easy items, and 50% medium items. The DIF of this test in this study is 27 items categorized as too easy, about 77% of the total items. On the other hand, this test only has 28% for moderate difficulty and 5% for difficult things. This test needs modification for the item compositions.

The distractors of tests should be good for a test item. It should be selected by>5% of participants to be a good distractor. About 21 or 19% of distractors in this test are good, which can be kept, but 84 or 80% of distractors in this test are less or poor, and it needs modification.

On the other hand, it can be said that 85.7 distractors in this test are non-functional distractors;

each distractor was selected from less than five students.

Conclusion

Item analysis is an essential stage conducted after test administration to determine the reliability and validity of test items by calculating the discrimination index, difficulty index, and distractor efficiency. EAP reading MCQs used in this study have good test validity. On the other hand, the DI and DIF of this test are not good because most of the results are poor DI and too easy DIF, which means that most test items and the distractors need modifications. Most distractors used in this test are not functional and must be removed, changed, or modified.

Acknowledgment

The researchers would like to thank the first year of Civil Engineering students at Universitas Muhammadiyah Surakarta for their participation and willingness to participate in this study.

References

Boopathiraj, C., & Chellamani, K. (2013). Analysis of Test Items on Difficulty Level and Discrimination Index In The Test for Research In Education. International Journal of Social Science & Interdisciplinary Research, 2(2), 189–193.

Browder, D. M., Wakeman, S., & Flowers, C. P. (2006). Assessment of Progress in The General Curriculum For Students With Disabilities. Theory into Practice, 45(3), 249–259.

https://doi.org/10.1207/s15430421tip4503_7

Brown, H. D. (2004). Language Assessment: Principles and Classroom Practices. Longman.

Burud, I., Nagandla, K., & Agarwal, P. (2019). Impact of distractors in item analysis of multiple choice questions. International Journal of Research in Medical Sciences Vol. 7 No. 4 pp:

1136-1139. DOI: http://dx.doi.org/10.18203/2320-6012.ijrms20191313.

https://www.researchgate.net/publication/332050250_Impact_of_distractors_in_item_an alysis_of_multiple_choice_questions

(10)

Elgadal, A. H. and Mariod, A. A. (2021). Item Analysis of Multiple-Choice Questions (MCQs):

Assessment Tool for Quality Assurance Measures. Sudan Journal of Medical Sciences Vol.

16 Issue No. 3, DOI 10.18502/sjms.v16i3.9695

Flucher, G., & Davidson, F. (2007). Language Testing and Assessment: An Advance Resource Book. Routledge.

Haladyna, T. M. et al. 2002. A review of multiple choice item writing guidelines for classroom assessment. Applied Measurement in education 15 (3), 309-334. Retrieved from http://site.ufvjm.edu.br/fammuc/files/2016/05/item-writing-guidelines.pdf

Haladyna, T. M. (2004). Developing and Validating Multiple-Choice Test Items (3rd ed.).

Lawrence Erlbaum Associates Publisher.

Hartati, N. and Yogi, H. P. S. (2019). Item Analysis for a Better Quality Test. English Language in Focus Journal (ELIF) Vol. 2 No. 1. DOI:10.24853/ELIF.2.1.59-70. Retrieved from Item Analysis for a https://jurnal.umj.ac.id/index.php/ELIF/article/view/4541

Hingorjo, M. R., & Jaleel, F. (2012). Analysis of One-Best MCQs: The Difficulty Index, Discrimination Index and Distractor Efficiency. JPMA-Journal of the Pakistan Medical Association, 62(2), 142–147.


Ismail, F.K.M. and Zubairi, A. M. B. (2022). Item Analysis of a Reading Test in a Sri Lankan Context using Classical test Theory. International Journal of Learning, Teaching and Educational Research Vol. 21 No. 3 pp.36-50. https://doi.org/10.26803/ijlter.21.3.3

Maharani, A. V., & Putro, N. H. P. S. (2020). Item analysis of English final semester test. Indonesian Journal of EFL and Linguistics, 5(2), 2020.

https://doi.org/10.21462/ijefl.v5i2.302

Mahjabeen, W. et al. (2018). Difficulty Index, Discrimination Index and Distractor Efficiency in Multiple Choice Questions. Ann. Pak. Inst. Med. Sci. ISSN 1815-2287. Retrieved from (PDF) Difficulty Index, Discrimination Index and Distractor Efficiency in Multiple Choice Questions (researchgate.net)

Matazu, S. S., & Julius, E. (2021). Item Analysis: A Veritable Tool for Effective Assessment in teaching and Learning. Journal of Education and Practice. ISSN 2222-1735 (paper). DOI:

10.7176/JEP/12-21-04

Mukherjee, P., & Lahiri, S. K. (2015). Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from an assessment in a medical college of Kolkata, West Bengal. IOSR Journal of Dental and Medical Sciences (IOSR-JDMS), 14(12), 47– 52.

Mulyani, H. et al. 2020. Quality analysis of teacher-made tests in financial accounting subject at vocational high schools. Jurnal Pendidikan Vokasi Vol. 10 No. 1, 2020 (1-9). DOI DOI:

https://doi.org/10.21831/jpv.v10i1.29382

O’Connor, C. and Joffe, H. 2020. Intercoder Reliability in Qualitative Research: Debates and Practical Guidelines. International Journal of Qualitative Methods.

https://doi.org/10.1177/1609406919899. Accessed from

https://journals.sagepub.com/doi/full/10.1177/1609406919899220

Office of Educational Assessment. (2018). Understanding item analyses. University of Washington. Retrieved from Understanding Item Analyses | Office of Educational Assessment (washington.edu)

Osterlind, SJ. (1998). What is constructing test items?. In: Constructing Test Items. Evaluation in Education and Human Services, Vol. 47. Springer, Dordrecht.

https://doi.org/10.1007/0-306-47535-9_1

(11)

Pradanti, S. I., Martono, M., & Sarosa, T. (2018). An Item Analysis of English Summative Test For The First Semester of The Third Grade Junior High School Students in Surakarta.

English Education, 6(3), 312–318. https://doi.org/10.20961/eed.v6i3.35891

Rosana, D., & Setyawarno, D. (2017). Statistik Terapan Untuk Penelitian Pendidikan. UNY Press.

Satyendra, Chakrabartty Nath (2021) "Improved Quality: Item and Test Parameters," Health

Sciences: Vol. 1: Iss. 1, Article 46.

Available at: https://doi.org/https://doi.org/10.15342/hs.2020.267

Scouller, K. (1998). The Influence of Assessment Method on Students’ Learning Approaches:

Multiple Choice Question Examination Versus Assignment Essay. Higher Education, 35(4), 453–472. https://doi.org/10.1023/A:1003196224280

Setiyana, R. (2016). Analysis of Summative Tests for English. English Education Journal, 7(4), 433–447.

Sharma, Lok Raj. (2021). Analysis of Difficulty Index, Discrimination Index and Distractor Efficiency of Multiple Choice Questions of Speech Sounds of English. International Research Journal of MMC Vol. 2 Issue 1. ISSN 2717-4999 (online)

Testing services. University of Wisconsin Oshkosh. Accessed from Score Report Interpretation - Testing Services University of Wisconsin Oshkosh (uwosh.edu)

Referensi

Dokumen terkait

The subject of the study is the English final test items for the second semester of the Eleventh grade of SMA Negeri 1 Magetan in the 2013/2014 academic year.. The