An Analysis of the english sumamtive test items in terms of difficulty level: a Case study of the second year students of Mts Darul Ma' arif Jakarta dcsi

(1)

AN ANALYSIS OF THE ENGLISH SUMMATIVE TEST

ITEMS IN TERMS OF DIFFICULTY LEVEL

(A Case Study Of The Second Year Students Of Mts Darul Ma’arif Jakarta)

A “Skripsi”

Presented to the Faculty of Tarbiyah and Teachers’ Training in Partial Fulfillment of the Requirements

for the Degree of S.Pd. (Bachelor of Arts) in English Language Education

By:

Rika Amelia

NIM. 105014000357

DEPARTMENT OF ENGLISH EDUCATION

FACULTY OF TARBIYAH AND TEACHERS’ TRAINING

SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY

JAKARTA

(2)

(3)

“Search for the things you are good at. Work at them until you are the best.

(4)

The Procedure of the Research

The steps of conducting the research are follows:

1) Collecting the answer sheets and the test.

2) Checking the key answer of the test to see whether the key answer has

been made correctly by the teacher. Then, the result of this checking

becomes reference to score the students’ responses.

3) Arranging and tabulating the answers from highest score to the lowest one.

4) Taking 27% from highest rank to be the upper group; and 27% from the

lowest rank to be the lower group.

5) Calculating and tabulating the students’ responses in the upper and lower

group who response each item correctly and put it in the format of

tabulation of the item analysis.

(5)

AN ANALYSIS OF THE ENGLISH SUMMATIVE TEST

ITEMS IN TERMS OF DIFFICULTY LEVEL

(A Case Study Of The Second Year Students Of Mts Darul Ma’arif Jakarta)

A “Skripsi”

Presented to the Faculty of Tarbiyah and Teachers’ Training In Partial Fulfillment of the Requirements

For the Degree of S.Pd. in English Language Education

Approved by:

Dr. M.M. Farkhan, M. Pd

NIP. 150 299 480

DEPARTMENT OF ENGLISH EDUCATION

FACULTY OF TARBIYAH AND TEACHERS’ TRAINING

SYARIF HIDAYATULLAH STATE ISLAMIC UNIVERSITY

JAKARTA

2010

(6)

ENDORSEMENT SHEET

The examination committee of the Faculty of Tarbiyah and Teachers’

Training certifies that the “Skripsi” (Scientific Paper) entitled “An Analysis of

The English Summative Test Items in terms of Difficulty Level (A Case Study at

the Second Year Students of MTs. Darul Ma’arif Jakarta),” written by Rika

Amelia, student’s registration number 105014000357 was examined in the

examination session on June 25th, 2010. The “skripsi” has been accepted and declared to have fulfilled one of the requirements for the Degree of S.Pd.

(Bachelor of Arts) in English Language Education at English Education

Department.

Jakarta, June 26th 2010

The Examination Committee

Chairman : Drs. Syauki, M.Pd. ………

NIP. 19641212 199103 1 002

Secretary : Neneng Sunengsih, M.Pd. ………

NIP. 19730625 199903 200 1

Examiner I :Drs. Nasrun Mahmud, M.Pd ………

NIP. 150 041 070

Examiner II :Dr. Zaenal Arifin Toy, M.Sc ………

NIP. 150 031 215

Acknowledged by

Dean Faculty of Tarbiyah and Teachers’ Training

Prof. Dr. Dede Rosyada, MA

NIP. 19571005 198703 1 003

(7)

ABSTRACT

RIKA AMELIA. “An Analysis of the English Summative Test Items in terms of Difficulty Level for the Second Year Students of MTs Darul Ma’arif Jakarta”. A Paper, Study Program of English Education, Faculty of Tarbiya and Teachers’ Training, ‘Syarif Hidayatullah’ State Islamic University Jakarta, 2010. The research is purposed to measure the difficulty level of the English summative test items by calculation the students’ correct response from the upper and lower group with J.B Heaton’s formula referred from his book “Writing English Language Tests”. The result of this research is interpreted by the Suharsimi Arikunto’s criteria of items referred from his book “Dasar–dasar Evaluasi Pendidikan” that there are 20 items regarded as difficult item because they are at difficult level, ranges from 0.01 up to 0.30. Twenty one items regarded as good items because they are at moderate level, ranges from 0.31 up to 0.70. And there are 9 items regarded as easy items because they are at easy level, it ranges from 0.71 up to 1.00. From this information, it can be counted the difficulty level of all items by dividing the total of difficulty level of the items with the total number of students is 0.45. So, it can be said that the English summative test items for the second year students of Mts. Darul Ma’arif qualified as a good test seen from the difficulty level of all item which is at moderate level, because it ranges from 0.30 up to 0.70.

Key terms : Test - Item Analysis – Difficulty Level

(8)

ABSTRAK

RIKA AMELIA. “An Analysis of the English Summative Test Items in terms of Difficulty Level for the Second Year Students of MTs Darul Ma’arif Jakarta”. Skripsi, Jurusan Pendidikan Bahasa Inggris, Fakultas Ilmu Tarbiyah dan Keguruan, Universitas Islam Negeri Syarif Hidayatullah Jakarta, 2010.

Penelitian ini bertujuan untuk mengukur tingkat kesulitan soal dari tes sumatif bahasa inggris kelas VIII (delapan) Mts. Darul Ma’arif Jakarta dengan cara mengkalkulasikan respon jawaban yang benar dari kelompok upper dan lower dengan rumus hitungan dari J.B Heaton, dengan bukunya “Writing English Language Test” . Hasil dari penelitian ini diinterpretasikan dengan kriteria butir soal dari Suharsimi Arikunto, dengan bukunya “Dasar-dasar evaluasi pendidikan”, bahwa 20 butir soal merupakan soal yang sulit karena mereka berada pada level sulit (difficult) yaitu antara 0,00 sampai 0,30. 21 butir soal merupakan soal yang sedang karena berada pada level moderate yaitu antara 0,31 sampai dengan 0,70. dan 9 butir soal merupakan soal yang mudah karena berada pada level mudah yaitu antara 0,71 sampai dengan 1,00. Dari informasi tersebut dapat dihitung tingkat kesukaran dari seluruh butir soal dengan cara membagi total keseluruhan nilai tingkat kesukaran tiap butir soal dengan seluruh jumlah siswa, maka diperoleh nilai 0,45. Dengan demikian, soal tes bahasa inggris tersebut dikatakan baik dilihat dari dilihat dari nilai tingkat kesukaran soal yang berada diantara 0,30 sampai dengan 0,70.

Kata kunci : Tes-Analisis butir soal-Tingkat kesukaran soal

(9)

ACKNOWLEDGEMENT

Bismillahirahmanirrahim,

In the name of Allah, the most beneficent and the most merciful. All praise

be to Allah SWT lord of the universe, peace and blessing be upon the prophet

Muhammad SAW, his family, his companions and all of his follows:

In finishing this paper the writer gets much valuable help from many

people who are too numerous to be mentioned, but in particular, the writer very

much grateful to:

1. The writer’s family, especially her beloved mother Misnawati and her beloved

father Ali Amran and sisters (Ranti Novitasari Royani Afriyani, Rahmi Sri

Wahyuni, Rani Asmawati and Ridhatulfahmi), her brothers (K’ Yan and K’

Rari), and her nieces (Wafa azzahhiyah, Abdurrahma Faiz and Aisha hilma

Abiya) who had prayed and supported for the writer.

2. Dr. M. M. Farkhan, M.Pd, as the writer advisors who have guided the writer

during the process of writing this paper

3. The lectures of department of English education faculty of tarbiyah and

teachers’ training Syarif Hidayatullah Jakarta who have given the knowledge

which is very useful for the writer.

4. The chairman of English Education Department, Syauki, M.Pd. and his

secretary, Neneng Sunengsih, M.Pd.

5. Prof. Dr. Dede Rosyada, MA. as the dean of Faculty of Tarbiyah and

Teachers’ Training of English Department

6. The headmaster of MTs Darul Ma’arif H. Antung Abdullah and the English

teacher Mrs. Ida S.Pd, who allowed the writer to do the research

7. The staffs of all libraries; the main library of State Islamic University ‘Syarif

Hidayatullah’, the Faculty of Tarbiyah and Teachers Training’s library, British

Counsil Library, Balai Pustaka, Aminef Library, the Catholic University of

(10)

Atmajaya’s library and PKBB Atmajaya. Thanks for providing the sources to

fulfill the refereces of the writing.

8. My inspired friends, Nadiyah, Irka, Sri Rizki, Reni, Ucha, Yuli, Cyifa, Ida and

Anita, thanks for your kindness to share ideas and time to accompany the

writer in finishing this “skripsi”, and to all PBI B friends and English

Department 2005 friends for their cheerfulness, support and prayer.

9. All people who have given their help in writing this paper that the writer could

not mention one by one. May Allah bless you all.

Jakarta, May 17th 2010

The writer

(11)

TABLE OF CONTENTS

COVER PAGE

APPROVEMENT SHEET ... i

ENDORSEMENT SHEET... ii

STATEMENT SHEET ... iii

ABSTRACT... iv

ACKNOWLEDGEMENT ... vi

TABLE OF CONTENTS ... viii

LIST OF TABLES... .x

CHAPTER I : INTRODUCTION A. The Background of the Study.………. 1

B. The Limitation of the Study.……… 4

C. The Formulation of the Problem……….. 4

D. The Significance of the Study………... 4

E. The Organization of the Paper……….. 5

CHAPTER II : THEORETICAL FRAMEWORK A. Evaluation…….…….………... 6

B. Test………... 7

C. Type of Tests……… 8

D. The Characteristics of a Good Test………. 17

1. Validity ………. 17

2. Reliability.………... 20

3. Practicality………... 20

E. Item Analysis.……….………... 21

1. Difficulty Level ………... 23

(12)

CHAPTER III : RESEARCH METHODOLOGY

A. The Objectives of the Research………... 28

B. The Method of Study………...………... 28

C. Time and Place…….………... 28

D. The Respondents……..………... 28

E. The Procedure of the Research………..…. 29

CHAPTER IV : RESEARCH FINDINGS A. The Data Description..……...………... 30

B. The Data Analysis…………...…………. 31

CHAPTER V : CONCLUSION AND SUGGESTION A. Conclusion...……...…………... 37

B. Suggestion...………... 37

BIBLIOGRAFI... 39

APPENDICES

(13)

LIST OF TABLES

Table 4.1 : The Students’ Group Position………... 30

Table 4.2 : The Category of FV of the English summative test items... 34

(14)

LIST OF CHARTS

Chart 4.1 : The result of difficulty level of each item……… 34

Chart 4.2 : Pie-chart of the difficulty level percentage ………. 35

(15)

xi

LIST OF APPENDICES

Appendix 1 : Tabulation of the students’ correct answer from upper group

Appendix 2 : Tabulation of students’ correct answer from lower group

Appendix 3 : Table of the result of difficulty level of the items

Appendix 4 : The procedure of the research

(16)

OUT LINE

CHAPTER I INTRODUCTION

A. Background of Study

B. Significance of the Study

C. Limitation of Problem

D. Formulation of Problem

E. Research Methodology

F. Organization of Writing

CHAPTER II THEORETICAL FRAMEWORK

A. Evaluation

B. The definition of test

C. Testing roole

D. Types of test

a. Function

1. The placement test

2. The diagnostic test

3. The achievement test

4. The proficiency test

b. Way of scoring

1. Objective test

2. Subjective test

E. The Characteristic of a Good Test

1. Validity

2. Reliability

3. Practically

F. Item Analysis

1. Level of difficulty

2. Discriminating power

3. Distracter

(17)

CHAPTER III PROFILE OF SCHOOL

A. History of School

B. Vision and Mision of School

C. Facilities of School

D. Organization Structure of School

E. Teachers, Staffs, and Students

CHAPTER IV RESEARCH FINDINGS

A. Population and Sample

B. Time of Research

C. The Data Description

D. The Data Analysis

E. The Data Interpretation

CHAPTER V CONCLUSION AND SUGGESTION

A. Conclusion

(18)

CHAPTER I

INTRODUCTION

A. Background of Study

Evaluation is an important part of every teaching and learning experiences.

It gives big contribution for the teaching and it provides an information about

the students’ progress which can be used by the teachers to manage the

learning task and students. As stated by Pauline Rea- Dicksin and Kevin

Germain; “Evaluation is important for the teacher because it provides a wealth

of information to use for the future direction of classroom practice, for the

planning of courses and for the management of learning tasks and students.”1

Evaluation also can be said as the process to make desirable decision toward

teaching and learning based on the information that has been collected,

synthesized, and reflected on. Lyle F. Bachman states “Evaluation can be

defined as the systematic gathering of information for the purpose of making

decision”.2

Depending upon the decision being made and the information a teacher

needs in order to inform that decision, testing often contribute to the process

as the implementation of evaluation. Indeed, a test is one kind of evaluation

instrument to collect data. “A test is defined as a systematic procedure for

observing and describing one or more characteristics of a person with the aid

of either a numerical scale or category system”.3 In other word, a test measures a person’s ability or knowledge with a number of tasks or questions.

According to Henning “. . . tests in general is to pinpoint strengths and

1

Pauline Rea and Kevin Germain, Evaluation, (New York: Oxford University Press, 1992), p. 3 2

Lyle F. Bachman, Fundamental Considerations in Language Testing, (Oxford; oxford University Press, 1990), p. 22

3

Anthony J . Nitko, Educational Test and Measurement, An Introduction, (New York: Harcourt Brace Javanovich, Inc, 1983), p.6

(19)

2

weakness in the learned abilities of students”.4 Teachers need to do the test because through the test they are able to find out the students’ achievement in

mastering the lessons that have been taught and to evaluate the effectiveness

of the method used and the teaching material. Rebecca M. Valette states,

“…through tests the teacher can evaluate the effectiveness of a new teaching

method, of a different approach to a difficult pattern, or of a new materials”.5 To measure the students’ learning progress at school, a teacher

commonly administers two kinds of test; formative test and summative test.

The former test is held earlier than latter test which is held at the end of

semester. Through both tests, a teacher can measure the students’ achievement

level and the degree of how far the instructional objectives of learning be

accomplished by them. For this reason, Gronlund states that;

“Formative test is used to monitor learning progress during instruction. Its purpose to provide continuous feedback to both pupil and teacher concerning learning successes and failures ………..And summative test typically comes at the end of a course of instruction. It is designed to determine the extent to which the instructional objectives have been achieved and is used primarily for assigning course grades or for certifying pupil mastery of the intended learning outcomes”.6

For getting accurate measures a test must have a good quality, because

a good test doesn’t only influence the students learning, but also influences the

teachers to improve teaching and learning process. JB. Heaton supports that

“Test may be constructed primary as device to reinforce learning and to

motivate the students’ performance in language”7. In addition, Lyle F. Bachman states also that “Test are often used for pedagogical purposes, either

4

Grant Henning, A Guide to Language Testing, (U.S.A: Newbury House Publishers, 1987), p. 1 5

Rebecca M. Valette, Modern Language Testing, (U.S.A; Harcourt Brace Javanovich, 1977), p. 5

6

Norman E. Gronlund, Measurement and Evaluation in Teaching 4th edition, (Macmillan; Publishing Company, 1976), p. 18

7

(20)

3

as a means of motivating students to study or as means of reviewing material

taught”.8

As the accuracy of a test result influences the motivation of students

learning, so the test administered must reflect a good test. A good test is a test

which has the criteria of validity, reliability, and practically. Beside that, it

must has discriminating power and difficulty level.9 A test can be valid if the test can measure what is supposed to measure. It can be reliable if the result

of the test is the same even though the test administered to the same level

students in the next time. And it can be practical if it is easy to do and

administer.

The matter, which is often forgotten by the teacher is the follow up of

the test implementation pertaining to the test item it self. In fact, they do not

criticize whether or not all items have fulfilled the criteria above. Therefore, it

really required an analysis of the test items, that is namely “item analysis”.

Through analyzing test item teacher can identify good item and the poor item

and to differentiate between student who have done well and poorly.

According to J. Stanley Ahmann and Marvin D. Glock, the purpose of doing

item analysis is:

“Re-examining each test item to discover its strengths and flaws is known as item analysis. Item analysis usually concentrates on two vital features; level of difficulty and discriminating power. The former means the percentage of pupils who answer correctly each item; the latter the ability of the test item to differentiate between pupils who have done well and those who have done poorly”.10

8

Lyle F. Bachman, Fundamental Consideration in Language Testing, (Toronto; Oxford University Press, 1990), p. 22

9

JB. Heaton, Writing English Language Test, (New Delhi; Tata Mc. Graw-Hill Publishing Company, 1998), p. 152-156

10

(21)

4

In addition Ngalim Purwanto states; “Tujuan Khusus dari item analisis adalah

mencari soal tes mana yang baik dan mana yang tidak baik, dan mengapa item

atau soal itu dikatakan baik dan tidak baik.”11

The latest English summative test at MTs.Darul Ma’arif was held on

June 19, 2009. According to pre-survey result during teaching practice at Mts.

Darul Ma’arif , the writer was informed that in the occasion of second

semester, the English teacher has never analyzed the test items, so that is

difficult to say whether it is a good test or not. In addition, the test results

show that the scores of the students’ are bad.

Considering this fact, the writer is interested in making item

analysis through the items of English summative test at MTs. Darul Ma’arif

Jakarta, in the second term 2008/2009 academic year.

B. Limitation of the Problem

The writer limits the study of item analysis of the English summative

test which is administered for the second year of MTs. Darul Maa’rif Jakarta

2008/2009 academic year on the aspect of difficulty level or facility value.

C. Formulation of Problem

Based on the background of study described, the writer would like to

seek the answer the following problem; “Does the English summative test

items for the second year students of MTs. Darul Ma’arif Jakarta have a good

quality in terms of difficulty level?”

11

(22)

5

D. Significance of the Study

Firstly, it provides with the feedback to the writer especially, and the

English teacher of how to analyze the test items in terms of difficulty level.

Secondly, it informs the English teacher about the quality of test

items in terms of difficulty level. Through this research, the English teacher

can know the good items for the future used and the students’ achievement in

mastering the materials taught in order to evaluate the teacher’s competence in

teaching.

E. Organization of Writing

In discussing the topic, the writer divides this study into five chapters,

as follow

Chapter one is introduction, involving background of study,

significance of study, limitation of problem, formulation of problem,

significance of study and organization of writing.

Chapter two is theoretical framework which discusses about

evaluation, the test and its types, the criteria of a good test and item analysis

Chapter three discusses is research methodology which is include the

objective of research, the method of study, the time and place, the population

and sample, the instrument and the procedure of the research.

Chapter four presents the research findings which consist of the data

description and the data analysis.

Chapter five is devoted to the conclusion of what has been discussed

and analyzed in the chapters before, and also the writer’s suggestion through

(23)

CHAPTER II

THEORETICAL FRAMEWORK

A. The Definition of Evaluation

Evaluation is important for every process of anything that has done, because through evaluation we can find out the weakness which should be revised and the strengths which should be improved, so does in the teaching learning process evaluation plays important role to contribute and provide some information for making judgments about what is good or desirable as in order to improve the students’ knowledge in learning and the teacher’s competence in teaching,. It is likely what Peter W. Airasian defines: “Evaluation is the process of judging the quality or value of a performance or a course of action”.1Still in the same sense Lyle F. Bachman states “Evaluation can be defined as the systematic gathering of information for the purpose of making decision”.2And evaluation includes, “the making judgments about the value, for some purpose, of ideas works, solutions, methods, materials, etc”.3 Hence, Benjamin S. Bloom,et.al states that “Evaluation is a system of quality control in which It may be determined at each step in the teaching-learning process whether the process is effective or not, and if not what changes must be made to its effectiveness before it is too late”.4

Basically, the purpose of evaluation is to judge the worth of program or procedure, usually in terms of how well it has achieved its objectives and

1

Peter W. Airasian, Classroom Assesment; Concepts and Applications, (1221 Avenue of the Americas, New York, NY 10020; McGraw-Hill, 2005}, 5th edition, p. 9

2

Lyle F. Bachman, Fundamental Confiderations..., p. 22

3

Julian C. Stanley, Measurement In Todays’ School, (Englewood Cliffs; Prentice-Hill, Inc, 1964), p. 16

4

Benjamin S.Bloom, Handbook on Formative and Summative of Students Learning, (London; Longman, 1971), p. 8

(24)

7

for this purpose all appropriate techniques of gathering evidence may be used.5 “Evaluation goes beyond the statement of how much to concern it self with the question what value. It seeks to answer the pupil’s and teacher question of what progress am I making???.6 Richard I. Arends states that “ An important purpose of testing and evaluation is to provide students with feedback on how they are doing”.7

Finally, considering all those opinions above about evaluation, the writer can summarize that evaluation is a systematic process to provide available information in order to make judgment and desirable decision of how to measure whether the objective is suitable or in line of the curriculum used, and to find out the students’ improvement in teaching learning process and the teacher competences in teaching, and also the classroom climate.

B. The Definition of Test

When people hear the word assessment and evaluation, they often think right a way of tests because a test is one of the instruments of evaluation for collecting the data. A test is a formal, systematic, usually paper-and-pencil procedure for gathering information about pupil’s performance.8 While paper-and-pencil tests are one important tool for gathering assessment information.

A test is composed of a number of tasks or questions for students to respond. By analyzing the responses, the teacher can measure the student’s achievement in the teaching learning process. While Lyle F. Bachman states that; “A test is a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual”.9 While

5

Victor H. Noll, Introduction to Educational Measurement, (Boston; Houghton Mifflin Company, 1965), 2nd edition, p.14

6

H.H.Remmers, N.L.Gage, J.Francis Rummel, A Practical Introduction to Measurement and Evaluation, (USA; Harper and Brother Publishers, 1960), p. 7

7

Richard I. Arends, Learning To Teach, (New York, Mc.GrawHill International Edtion, 1989), p. 312

8

Peter W. Airasian, Classroom Assesment..., p. 9

9

(25)

8

Wilmar states that; “A test is a set of questions, each of which has a correct answer, that examinees usually answer orally or in writing”.10

From those views of test, it can be concluded that a test can be instrument, techniques, or procedures to have the students’ respond through tasks or performance in the form of set of questions must be answered in order to achieve the teaching-learning objectives. In short, a test is a measurement instrument designed to assess a specific sample of individuals’ behavior.

Test is also a way to deliver information, which is very useful for many practitioners of education. “A test is a formal systematic procedure for gathering information”.11 Therefore, test a device of educational is necessary in a teaching process, since testing and teaching can not be separated. Heaton states that ”both testing and teaching are so closely interrelated that is virtually impossible to work in either field without being constantly concerned with the other”.12The reason of that interrelation and connection between testing and teaching is the material tested, must be based on the material taught in order to find out how far the students comprehension.

C. Type of Tests

There are many kinds of tests used to measure students’ achievement that can be used in an evaluation process. The type of test can be classified into two types, namely; function and way of scoring.

1. Function

According to Andrew Harrison, the types of functional test can be categorized into four types: placement test, diagnostic test, achievement test, and proficiency test.

10

Wilmar Tinambuan, Evaluation of Students Achievement, (Jakarta; Depdikbud, 1988) p. 310

11

Julian C. Stanley, Measurement in Today’s..., p.3

12

(26)

9

a. The Placement test

Placement test is used to place a student to appropriate level or section of a language curriculum or school. It usually happens in the beginning of course. According to Wilmar Tinambuan;

A placement test is designed to determine pupil performance at the beginning of instruction. Thus, it is designed to sort new students into teaching groups, so that they can start a course at approximately the same level as the other students in the class. It is concerned with the student’s present standing, and so relates to general ability rather than specific points of learning. As a rule the result are needed quickly so that the teaching may begin.13

b. The Diagnostic Test

Diagnostic test is designed to diagnose a particular aspect of a language. “Diagnostic tests are also achievement test, but they are characterized by one distinctive feature, namely that they are designed to show specific weakness and strengths within the skills or elements measured”.14

It can also be used to check the students’ progress in learning particular elements of the course. It is used for example at the end of a unit in the course book or after lesson designed to teach one particular point.15 “A diagnostic test is designed to determine the degree to which the specific instructional objectives of the course have been accomplished”.16 And J.B Heaton states that; “Diagnostic test is widely used, few tests are constructed solely as diagnostic tests. Note that diagnostic testing is frequently carried out of groups of students rather for individuals”.17

13

Wilmar Tinambuan, Evaluation of Students..., p. 7

14

Robert Lado, Language Testing, (Hongkong; Wing Tai Cheung Printing Co Ltd, 1961), p. 369

15

Andrew Harrison, A Language Testing Handbook, (London; Macmillan Press, 1983), p.6

16

James Dean Brown, Testing in Language Program, (New Jersey; Prentice Hall Regents, 1996), p. 15

17

(27)

10

Thus, diagnostic test is much comprehensive and detailed because it searches for the underlying causes of learning difficulties and then formulates a plan for remedial action.

c. Achievement Test

These tests are used to know what students have actually learnt or on what have actually been taught. “Achievement tests are designed to measure relative accomplishment in specified areas of work”.18 The purpose of achievement test as its name reflect is to establish how successful individual students, groups of students, or the courses themselves have been in achieving objectives.19 In another point of view Wilmar says that “the degree purpose of achievement test is designed to indicate degree of students’ success in some past learning activities”.20 And also “Achievement tests relate to the past in that they measure, what language the students have learned as a result of teaching”.21

Based on the argumentation above about achievement test, the writer can conclude that the achievement test are intended to measure how effectively students have mastered the lesson and how far they have reached the instructional objectives. Thus, an achievement test must be designed with very specific reference to a particular course. This link with a specific program usually means that the achievement tests will be directly based on the course objectives and will therefore be criterion referenced. Such tests will typically be administered at the end of a course to determine how effectively students have mastered the instructional objectives.

At the implementation level, the achievement test appears in two purposeful tests, they are formative test and summative test.

18

H.H. Remers, NL. Gage, J. Fraancis Rummel, A Practical Introduction..., p. 19

19

Arthur Hughes, Testing for Language Teachers, (Cambridge; Cambride University Press, 2003), p. 13

20

Wilmar Tinambunan, Evaluation of Students..., p. 19

21

(28)

11

1) Formative test

Formative test is administered by the teacher during the learning progress with the aim of using the result to improve instruction and to provide continuous feedback to both students and teacher. Rebecca M. Valette states “The formative test is given during the course of instruction; its purpose is to show which aspects of the chapter the student has mastered and where remedial work is necessary”.22 Hence, formative test is part of the instructional process. When incorporated into classroom practice, it provides the information needed to adjust teaching and learning while they are happening. In this sense, formative test informs both teachers and students about student understanding at a point when timely adjustments can be made. These adjustments help to ensure students achieve, targeted standards-based learning goals within a set time frame.23

2) Summative test

Summative test is a test that usually administered at the end of the course. Rebecca M. Valette states ”the summative test, on the other hand, is usually gives at the end of a marking period and measures the “sum” total of the material covered. On this type of a test, students are usually ranked and graded”. Moreover, summative test is given periodically to determine at a particular point in time what students know and do not know. Summative test at the district/classroom level is an accountability measure that is generally used as part of the grading process. Arthur Hughes states that”the content of summative test should be based directly on a detailed course syllabus or on the books and other material used”.24

22

Rebecca M. Valette, Modern Language..., p.6

23

http://www.nmsa.org/Publications/WebExclusive/Assessment/tabid/1120/Default.aspx

24

(29)

12

Finally, the writer can conclude that summative test is a test that usually administered at the end of a course of study.

d. Proficiency Test

The proficiency test is also used to measure what students have learned, but the aim of the proficiency test is to determine whether this language ability corresponds to specific language requirements”.25

According to J.B. Heaton that “the proficiency test is concerned simply with measuring a student’s control of the language in the light of what he or she will be expected to do with it in the future performance of a particular task “.26 And also James Dean Brown states that: “A proficiency-test assess the general knowledge or skill commonly required or prerequisite to entry into (or exemption from) a group of similar institution”.27

Then, it should never be undertaken lightly. Instead, these decisions must be based on the best obtainable proficient test scores as well as other information about the student. The content of proficiency test therefore, is not based on the content of objective of language courses that people taking the test may have followed. Rather, it based on a specification of what candidates may have to be able to do in language, in order to be considered proficient”.28

25

26

J.B. Heataon, Writing English... , p.172

27

James Dean Brown, Testing In Language..., p.10

28

(30)

13

2. Way of Scoring.

Based on the manner of scoring, the type of test item is divided into two general types: objective and subjective test. J.B. Heaton states that “Subjective and objective test are terms used to refer to the scoring of tests”.29

a. Objective test

An objective test item is any test item that there is only a single correct answer. In this test, the students must select one option from some alternatives. According to Valette; “An objective test item is any item for which there is a single predictable correct answer”.30

Hence, this item type referred as objective test item, because they can be scored objectively. That is, equally competent scorers can score them independently and obtain the same result. Therefore, whether the item is scored by one teacher or another, today or last week, it will yield the same score. That is, the advantages of the objective test items are objective scoring, that is quick, easy and consistent.

The objective test item commonly used in classroom testing are true-false, multiple-choice, matching, and short answers. “These test item include all of the selection-type items-multiple choice, true false, and matching.”31

1) True-False

True-false is simply a declarative statement which the students must judge as true or false. As what J. Stanley explained that “true-false item is referred to alternative response item; the

29

J.B. Heaton, Writing English..., p. 25

30

31

(31)

14

item asks the students to answer with the “true” if it conforms to the truth or “false” if it essentially incorrect.32

Thus, the item provides the students with a choice of two alternatives, so the students have possibility to guess the answer and sometimes it will be the right answer. In other word, students indicate whether a statement is true or false.

Example:

T F True-False items classified as supply-type item

2) Multiple-choice item

The multiple-choice item consists of a stem, which presents a problem situation, and several alternatives, which provide possible solutions to the problem. The stem may be a question or an incomplete statement. The alternatives include the correct answer and several plausible wrong answers, called distracters. Their function is to distract those students who are uncertain of the answer. “A multiple-choice item consists of one or more introductory sentences followed by a list of two or more suggested responses from which the examinee chooses one as the correct answer”.33

Example:

In objective testing, the term objective refers to the method of … a. identifying the learning outcomes

b. selecting the test content c. presenting the problem d. scoring the answers

3) Matching

The matching test item consists of two parallel columns with each word. Number of symbol in one column is being matched to a word, sentence or phrase in other column. This type

32

J. Stanley Ahman and Marvin D. Glock, Evaluating Pupil Growth..., p. 17

33

(32)

15

of item is employed widely in situation where relationship of more or less similar ideas, facts and principles are to be examined or judged. In this type, students indicate relationship between a set of premises and a set of responses.

Example: 1. The …. drives a car a. doctor 2. The …. checks the patience b. driver

This kind of test is an effective way to student’s recognition of the relationships between words, definitions, events, dates, categories, examples, and so on.

b. Subjective Test item

Subjective test is a test where in its scoring requires judgment and evaluation of scores. While Vallette states that “Subjective item is one that does not have a single right answer”.34 It means that the scoring is inconsistent and the answer of the question is in form of composition where the students are given a chance to relate their idea or argument in their own words. In other word, the answer is commonly in a form of composition or statement. “Subjective tests, like translation and essay, have the advantage of measuring language skill naturally, almost the way English used in a real life”.35

The subjective tests that are commonly used in classroom are completion, short-answer, and essay item.

1) Completion

The completion item is a written statement that requires the examinee to supply the correct word or short phrase in response to an incomplete sentence, a question or a word association.

34

Rebecca M. Valette. Modern Language..., .p. 10

35

(33)

16

Completion test can be used effectively to measure the recall of terms, dates, and names.36

The completion item and short answer item are both supply type test items, but in the short answer type, the blank is nearly always at the end, whereas in the completion, type of the blank may occur everywhere in the statement. 37

2) Short- answer Item

The short answer item consists of a question, which can be answered with a word or short phrase.38 A student provides a short response to a direct question or direction.

Generally, teachers prefer to use the short answer type question, probably because they think it has some advantages. It is relatively easy to construct, it also gives the teacher some opportunity to see how well students can express their thought and it is also not difficult to score or mark than the essay question.39 However, it is difficult to phrase the short answer question, so that only one answer is correct. And this type of question will be more useful only in testing knowledge of facts and quite specific information.

3) Essay test.

The most notable characteristic of the essay test is freedom of response it provides. The student is asked a question which requires him to produce his own answer. He is relatively free to decide how to approach the problem, what factual information to use, how to organize his reply, and what degree of emphasis to give each aspect of the answer. Thus, the essay question places a

36

Wilmar Tinambuan, Evaluation of Students..., p. 61

37

Victor H. Noll, Introduction to Educational..., p. 140

38

Victor H. Nol, Introduction to Educational..., p. 138

39

(34)

17

premium on the ability to produce, integrate, and express the ideas. As what Norman E Gronlund states that;

“Essay tests are inefficient for measuring knowledge outcomes . . . but they provide a freedom of response which is needed for measuring certain complex outcomes . . . . These include the ability to create . . . . to organize . . . . to integrate . . . . to express . . . and similar behaviors that call for the production and synthesis of ideas”.40

Finally, from the explanation above about both objective test and subjective test concerned on the essay test, the writer conclude that for the measurement of most knowledge outcomes we would use objective test items to take advantage of their more extensive sampling and greater reliability. For the measurement of such complex learning outcomes as the ability to create, organize, and evaluate ideas, however, the teacher would use essay questions despite their limitation.

Of the types of test item above, the writer will concern only with the multiple choice test item in English summative test for the second year students of Mts. Darul Ma’arif, administered at the end of the second semester 2008/209 academic year.

D. Criteria Of A Good Test

There are many considerations entering into the evaluation of a test, which referred as a good test because a good test can provide available information for a good evaluation in order to measure student’s comprehension of the instructional objectives, but the writer consider them under three main headings;. These are respectively validity, reliability, and practically. Validity refers to the extent to which a test measures what we actually wish to measure. According to Brown “Validity is the degree to which the test actually measures what is intended to measure…..Reliability is

40

(35)

18

consistent and dependable…….And practically is means of financial limitations, time constraints, ease of administration, and scoring and interpretation”.41

1. Validity

The single most important characteristic of a good test is its ability to help the teacher make a correct decision of what is intended to measure. This characteristic is called validity. “Validity is concerned with whether the information being gathered is relevant to the decision that needs to be made”.42

A test has validity if it measures appropriately, what it is supposed to measure. According to Heaton: “The validity of a test is the extent to which it measures what is to measure and nothing else”.43 Finnochiaro and Sako also state : “A test is valid when it measures effectively what it is intended to measure”.44 Still in the same sense, Wilmar states that “The validity of a test is the extent to which the test measures what is intended to measure”.45Also, Norman E. Gronlund states that “test scores are valid to the extent to which they serve the use for which they are intended”.46 While J. Staley Ahmann and Marvin D. Glock point out “In educational measurement, validity is often defined as the degree to which a measuring actually serves the purposes for which it is intended”.47

Based on the definition, the writer can conclude that validity of test is important to know whether a test has a good quality in testing someone’s capacity.

41

H. Douglas Brown, Teaching by Principles An Interactive Approach to Language

Pedagogy, (San Fransisco: Longman, 2nd edition), p. 386-387

42

Peter W. Airasian, Classroom Assesment..., p. 16

43

J.B Heaton. Writing English... , p. 159

44

Mary Finocchiaro and Sydney Sako, Foreign Language Testing a Practical Approach, (New York: Regent publishing company, 1983), p. 24

45

Wilmar Tinambunan, Evaluation of student..., p. 11

46

Norman E. Gronlund, Constructing Achievement..., p. 105

47

(36)

19

As the validity is one of the most important characteristic of test scores, the constructor of the test should know the various aspects from the validity itself and various procedures by which they are determined.

“The two most important characteristics of test scores are validity and reliability…Anyone working with tests-whether constructing them or using published tests-should understand the meaning of these concepts…and should know the various procedures by which they are determined”.48

According to Heaton, a validity of a test can be seen from some aspects mentioned below.

a. Face validity

A test has face validity if the test has a good “face” or the way the test looks. According to Heaton: “if a test items looks right to other testers, teachers, moderators, and testers, it can be described as having at least face validity”.49 While Marry Finocchiario and Sydney Sako define it is “A judgment about a test based on the way the test looks to educators, students, and the general public. The test should not only ‘be right’ it also ‘look right”.50

b. Content Validity

A test has content validity if the test contains materials that the student has been taught. To fulfill this, the teacher also should refer to the instructional objectives of the teaching learning process. Finocchiario and Sako state; “Content validity is assured by checking all items in the test to make certain that they correspond to the instructional objective of the course“.51Still in the same sense, Victor H. Noll explaines “when a teacher gives a test which deals with the

48

Norman E. Gronlund, Constructing Achievement..., p. 105

49

J.B Heaton, Writing English..., p. 159

50

Marry Finochiario and Sydney Sako, Foreign Language..., p. 28

51

(37)

20

material and with the objectives of instruction in particular class, his test is said to have curricular (content) validity”.52

c. Construct Validity

A test is said to have a construct validity if it can demonstrates that it measures just the ability, which it is supposed to measure .according to Heaton; “if a test has construct validity, it is capable of measuring certain specific characteristics in accordance with a theory of language behavior and learning”.53

d. Empirical Validity

A fourth type of validity is usually referred to as statistical or empirical validity. This validity is obtained as a result of comparing the result of the test with the result of some criterion measure.54

2. Reliability

The second criterion of a good test is reliability. Reliability has to do with the accuracy and precision of a measurement procedure. Indices of reliability give an indication of the extent to which a particular measurement is consistent and reproducible.55 A test should be reliable as a measuring instrument.

According to Finocchiario and Sako; the reliability or stability of a language test is concerned with the degree to which it can be trusted to produce the same result upon repeated administration to the same individual, or to give consistent information about the value of a learning variable being measured”.56While J. Stanley Ahmann and Marvin D. Glock state that “Reliability means consistency of results. This is equivalent to saying that a highly reliable instrument can be used

52

Victor H. Noll, Introduction to Educational..., p. 79

53

J.B. Heaton. Writing English..., p. 161

54

J.B. Heaton, Writing English..., p. 161

55

Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation in Psychology and Education, ( London; John Willey and Sons, Inc., 1961), p. 127

56

(38)

21

repeatedly in an unchanging situation and produce constant or near constant results.”57

Based on above statements a test is reliable if it consistently yields the same or nearly the same ranks over repeated administrations.

3. Practicality

Practicality is concerned with a wide range of factors economy, convenience and interpretability that determine whether a test is practical for widespread use. “Practically is concerned with a wide range of factors economy, convenience, and interpretability that determine whether a test is practical for widespread use”.58

A test maybe a highly reliable and valid instrument but still is beyond our means facilities. The teacher or someone who makes the test should keep in mind a number of very practical considerations. There are many factors of practicality; economy, scorability, and administrability.

According to Finnochiario and Sako state that “the criteria for practicality normally will be based upon such factors as economy, scorability, and administrability”. 59While, Harrison states that “tests should be as economical as possible in time (preparation, sitting, and marking) and in cost (material and hidden costs of time spent)”.60

In short, the criteria of a good test are validity, reliability and practicality. However, besides those three criteria, a good test as whole is also determined by the quality of each item that construct the set test. If the quality of each item is good, it can give the strength and accuracy of the scores get from the test. Then, the quality of each item individually can be analyzed by doing item analysis. According to Robert Lado; “item analysis is the study of validity, reliability, and difficulty of test item taken

57

J. Stanley Ahmann and Marvin D. Glock, , Evaluating Pupil..., p. 311

58

Robert L. Thorndike and Elizabeth Hagen, Measurement and Evaluation..., p. 127

59

Marry Finnochiario, Foreign Language Testing..., p. 30

60

(39)

22

individually as if they were separate tests”.61through this analysis, the evaluator can get information about which item is good for the future used.

D. Item Analysis

After a test has been administered and scored it is usually desirable to evaluate the effectiveness of the items. This is done by studying the students’ responses to each item. When formalized, the procedure is called item analysis. Anthony J. Nitko states, “item analysis refers to the process of collecting, summarizing, and using information about pupils’ responses to items”.62

Meanwhile Harold S. Madsen explained that:

“The selection of appropriate language items is not enough by it self to ensure a good test. Each questions needs to function properly; otherwise, it can weaken the exam. Fortunately, there are some rather some simple statistical ways of checking individual item. This procedure is called ‘item analysis’.”63

An item analysis also is a systematic procedure which provides some information about the quality of the test item, concerning each of the following points:

1. The difficulty of the item

2. The discriminating power of the item

3. The effectiveness of each alternatives or distracters.

(40)

23

soal-soal untuk kepentingan lebih lanjut, dan untuk memperoleh gambaran secara selintas tentang keadaan yang kita susun”.64

Item analysis data also aids in detecting specific technical flaws and thus further provides information for improving test items, as what J. Stanley Ahmann and Marvin D. Glock state “item analysis is re-examining each test to discover its strength and flaws”.65

Item analysis has several benefits. First, it provides useful information for class discussion of test. Second, it provides data for helping the students improve their learning. Third, it provides insights and skills which lead to the preparation of better tests on future occasions.66

Finally, the writer concludes that item analysis is very important to do in order to get information of the quality of the test item, whether it is good item or poor item.

1. Difficulty Level of The Item

The difficulty level of item means the percentage of pupils who answer correctly each test item. “The item difficulty is fraction of the persons taking an item who answer it correctly”.67 Heaton states that “The index of difficulty “(of facility value) of an item simply shows how easy or difficult the particular item provide in the test. The index of difficulty (facility value) is generally expressed as the fraction (percentage) of the students who answered the item correctly”.68

A good test item should have a certain degree of difficulty. It may not be too easy or too difficult because the test that is too easy or too difficult will yield same score distribution that make it hard to identify

64

Suharsimi Arikunto, Dasar-dasar Evaluasi Pendidikan, (Jakarta; Bina Aksara, 1987), p. 205

65

J. Stanley Ahmann and Marvin D. Glock, Evaluating Pupil Growth..., p. 184

66

Norman E. Gronlund, Constructing Achievement..., p. 85-86.

67

Anthony J. Nitko, Educational Test..., p. 288

68

(41)

24

reliable differences in achievement between the pupils who have done well and these who have done poorly. Suharsimi Arikunto says;

”Soal yang baik adalah soal yang tidak terlalu mudah atau tidak terlalu sukar. Soal yang terllau mudah tidak merangsang siswa untuk mempertinggi usaha siswaq untuk memecahkannya. Soal yang terlalu sukar akan menyebabkan siswa menjadi putus asa dan tidak mempunyai semangat untuk mencoba lagi karena diluar jangkauannya”.69

By analyzing the students’ response to the items, the level of difficulty of each item can be known and the information will be helpful for teacher in identifying concepts to re-teach the study material. In addition, by analyzing the facility value, the teacher will know if the item is easy, moderate, or difficult, M. Chobib Thoha states;

“item yang baik adalah item yang tingkat kesukarannya dapat diketahui tidak terlalu sukar dan tidak terlalu mudah. Sebab tingkat kesukaran itu memiliki korelasi dengan daya pembeda. Bilamana item memiliki tingkat kesukaran maksimal, maka daya pembedanya akan rendah, demikia pula bila item itu terlalu mudah juga tidak akan memiliki daya pembeda”.70

To measure the difficulty level of each item, the writer uses the Heaton’s formula; the formula is like this:71

n

FV : Facility value or item of difficulty that we are looking for CU : Sum of the students from the upper group who answer correctly CL : Sum of the students from the lower group who answer correctly 2n : Total number of the students from upper and lower group.

69

Suharsimi Arikunto, Dasar – dasar..., p. 207

70

M. Chobib Thoha, Teknik Evaluasi Pendidikan, (Jakarta; PT. Raja Gafindo Persada, 2003), p. 145

71

(42)

25

After calculating the difficulty level of each item, the writer calculates the index of difficulty of all item by this formula;

P = ∑b N

P : difficulty level of all items B : difficulty level of each items ∑ : Sigma (total)

N : Total numbers of test items.

To know the criteria of the difficulty level of each item and all items, the writer uses the measurement level referred to Suharsimi Arikunto’s book.72 If the FV is:

Difficult : 0.00 – 0.30 Moderate : 0.31 – 0.70 Easy : 0.71 – 1.00

The level of facility value shows the easiness or difficultness of test items for that group. So, the level of facility value is influenced by the students’ competence. The result will be different if the test is given to another group of learners or students.

E. The Importance of Item Analysis

An item analysis is very important for teachers in preparing better test items and help teachers in the teaching-learning process. “Item analysis is an important and necessary step in the preparation of good multiple-choice tests”.73

72

Suharsimi Arikunto, Dasar – dasar..., p. 210

73

(43)

26

‘For teacher made test, the following are among the important uses of item analysis: determining whether an item functions as teacher intends, feedback to the teacher about pupil difficulties, are for curriculum improvement, revising the item and improving item writing skills”.74

1. Determining whether an item functions as teacher intends.

The item will function properly if the test item tested is able to distinguish those who master the learning objectives from those who do not. To differentiate between them, the test item should have certain level of difficulty, discriminating power and the effectiveness of distracters. Therefore item analysis should be done.

2. Feedback to students’ performance and as a basis for class discussion. After knowing the students’ responds to the item, the students’ performance can be known and the students’ error can be corrected and the test items that are felt difficult for most of them can be discussed in their class.

3. Feedback to the teacher about pupils’ difficulties

The result of item analysis will be useful for teachers to know the major types of pupils’ difficulties in learning. So they know the material needs to be review in next learning.

4. Area for curriculum improvement.

By item analysis, it can be known what kind of items which are felt difficult by students or certain errors occur often, may be the item is not compatible to be taught in a school program. So curriculum may be needed to be revised.

74

(44)

CHAPTER III

RESEARCH METHODOLOGY

1. The Objective of The Research

The research is done to find out the difficulty level of the English

summative test items in the second year of Mts. Darul Ma’arif Jakarta in the

second term 2008/2009 academic year by calculation which is referred to J.B

Heaton’s book; “Writing English Language Test”.

2. The Method of Study

The method used in this study can be categorized into descriptive

analysis. This descriptive analysis is concerned with a quantitative analysis.

Quantitative is used in analyzing data of scores to detect the test items whether

it is good or not by using simple statistic tabulation.

3. The Time and Place

The research was held during teaching practice from March to June

2009 at MTs Darul Ma’arif which is located at Jl. Rs. Fatmawati No. 45

Cipete , South Jakarta .

4. The Respondents

The writer took the result of the English summative test of the second

grade at MTs. Darul Ma’arif Cipete South Jakarta, which consist of 50 English

multiple choice items. The respondents of this research are the second year

students of MTs. Darul Ma’arif Jakarta, which which consists of 36 students.

5. The Instrument of the Research

The research instrument is the English summative test paper for the second

year students of MTs., Darul Ma’arif Jakarta.

(45)

CHAPTER IV

RESEARCH FINDINGS

A. The Data Description

The English summative test consists of 50 multiple choice items. As noted in the procedure of the research, the items are analyzed by arranging the students’ correct answers of each item from the highest to the lowest score. After correcting the students answer sheet, the writer listed the score of the students from the highest score to the lowest score. The score given by the writer is to make it easier to divide those students into three groups. The way the writer scored is by multiplying the number of correct answer by two point because there are 50 items in the test. The following tables show their scores and their groups.

Table 4.1

Group position of English summative test for 36 of the second year students of Mts. Darul Ma’arif Jakarta in the second term 2009/2010 academic year

(46)

29

Table 1. lists the students from those who get the highest score to those who got the lowest score. The score given by the writer is to make it easier to divide those students into three groups; upper, middle and lower groups. 27% is taken from the highest scores to be UPPER group, and 27% from the lowest scores to be LOWER group, to do the analysis, the MIDDLE group will be a side.

B. The Data Analysis

(47)

30

1. The answers from the Upper Group (10 students)

No one student got all items correctly. It is found that only one student got 35 items correctly; 1 student got 34 items; 2 students got 29 items; 1 student got 27 items; 3 students got 26 items; 1 student got 25 items; and 1 student got 24 items correctly. The responses as follow:

a. 10 students answer correctly numbers; 5, 6, 7, 8, 11, 17, 19, 20, 32, 37 b. 9 students answer correctly numbers; 43

c. 8 students answer correctly numbers; 18, 25, 35, 36, 46 d. 7 students answer correctly numbers; 3, 22, 33

e. 6 students answer correctly numbers; 2, 10, 15, 27, 32 f. 5 students answer correctly numbers; 1, 4, 21, 34, 45 g. 4 students answer correctly numbers; 23, 26, 38, 40, 47

h. 3 students answer correctly numbers; 14, 16, 24, 30, 39, 44, 48 i. 2 students answer correctly numbers; 9, 12, 13, 28, 41, 49 j. 1 student answers correctly numbers; 29, 42, 50

2. The answer from the Lower Group (10 students)

Meanwhile, in lower group, only one student got 20 items correctly; 3 students got 19 items; 2 students got 17 items; 3 students got 16 items and 1 student got 11 items correctly. The responses are as follow:

a 10 students answer correctly number; 17 b 9 students answer correctly number; 8 c 8 students answer correctly numbers; 5, 44

d 7 students answer correctly numbers; 11, 20, 25, 32 e 6 students answer correctly numbers; 19, 43, 46

f 5 students answer correctly numbers; 18, 22, 26, 31, 35 g 4 students answer correctly numbers; 4, 36, 37

h 3 students answer correctly numbers; 6, 7, 10, 15, 16, 27, 30, 34, 41, 50.

i 2 students answer correctly numbers; 21, 42, 45

(48)

31

k 0 students answer correctly numbers; 38, 39, 48

Then, as noted earlier, the data for upper and lower group are calculated by using Heaton’s formula to get the difficulty level (FV) of each item.

n

L Correct U

Correct FV

2

+ =

Explanation:

FV : Facility value or item of difficulty that we are looking for CU : Sum of the students from the upper group who answer correctly CL : Sum of the students from the lower group who answer correctly 2n : Total number of the students from upper and lower group.

Afterwards, the result of the calculation is interpreted by using Arikunto’s criteria. If the FV is:

Difficult : 0.00 – 0.30 ` Moderate : 0.31 – 0.70 Easy : 0.71 – 1.00

(49)

32

Chart. 4.1

The result of difficulty level of each item

0

The chart.1 above explains the distribution of the difficulty level criteria of English summative test each item. The detailed distribution is as follows ;

a There are 20 items, which categorized difficult. It means they are in range between 0.00 up to 0.30. Those are numbers; 1, 9, 12, 13, 14, 16, 23, 24, 28, 29, 30, 38, 39, 40, 41, 42, 47, 48, 49, 50.

b There are 21 items, which are categorized medium, because they are in range between 0.31 up to 0.70. Those are numbers; 2, 3, 4, 6, 7, 10, 15, 18, 21, 22, 26, 27, 31, 33, 34, 35, 36, 37, 44, 45, 46.

c There are 9 items, which are categorized easy, because they are in range between 0.71 up to 1.00. Those are numbers 5, 8, 11, 17, 19, 20, 25, 32, 43.

From the result of difficulty level (FV) of each item, the writer calculates the percentage distribution of each items referred to their category in table form as follow;

Table 4.2

The category of difficulty level of the English summative test items.

NO Range of

Difficulty Level Category Frequency Percentage

1. 0.00 – 0.30 Difficult 20 40% 2. 0.31 – 0.70 Moderate 21 42% 3. 0.71 – 1.00 Easy 9 18%

(50)

33

Based on the table result of facility value or difficulty level data of English Summative test at MTs. Darul ma’arif Jakarta, it can be said that there are not balancing value for each category. In other word, the easy items take the lowest portion. However, the spread of the item ideally should be balanced. It means, forty percent of the items are in medium category, thirty percent of the items are in easy category, and thirty percent of the items are in difficult category. Sudjana states that “jumlah soal untuk ketiga kategori…artinya, soal mudah, sedang dan sukar jumlahnya seimbang…Perbandingan antara soal mudah-sedang dan sukar bisa dibuat 4-4….perbandingan lain yang termasuk sejenis dengan proporsi….misalny 3-5-2”1

Afterwards, the writer summarizes the percentage distribution or proportion for each category in the chart form.

Chart. 4.2

Pie-chart of the difficulty level percentage. (English summative test items of Mts. Darul Ma’arif)

Difficult moderate easy Slice 4

The last step is to count the difficulty level of all items by using this formula;

(51)

34

After calculating the difficulty of all items by using that formula, the writer got the result is 0.451. The detailed format result of the difficulty level of each item and the difficulty level of all items can be seen in the appendices labeled table 3. In this table, the result of each item will be in decimal. As noted earlier, the writer can interpret the result of difficulty level (FV) of all items according to Arikunto’s criteria.

(52)

CHAPTER V

CONCLUSION AND SUGGESTION

A. Conclusion

Based on the data analysis and interpretion in the previous chapter, the

writer would like to conclude that the difficulty level of English Summative

Test items for the second year student of MTs Darul Ma’arif are as follows:

1. There are 21 items regarded as good test items because they are at

moderate level, ranges from 0.31 to 0.70 (42 %).

2. 20 items regarded as difficult items because they are at difficult level,

ranges from 0.00 to 0.30 (40%).

3. And the others 9 items regarded as easy items because they are at easy

level, ranges from 0.71 to 1.00 (18 %).

Overall, from this analysis it can be said that Summative Test of English

students for the second grade students at Darul Ma’arif Jakarta has moderate

level of difficulty level. It means, this test qualifies as good enough test seen

as the difficulty level of all items.

B. Suggestion

Based on the conclusion above, the writer would like to give some

suggestions concerning the item analysis result:

1. For the further research the discriminating power analysis and content

validity of the English summative test items is necessary in order to find

the poor items to be revised.