An Item Analysis on the diffficulty level of an english summative test: a case study at second grade students of SMP YMJ Ciputat

(1)

A ”Skripsi”

Presented to The Faculty Of Tarbiyah and Teacher Training in a Partial

Fulfillment of Requirements for the Degree of S.Pd (Bachelor of Arts) in

English Language Education

By:

Fifi Maghfiroh

NIM. 204014003208

ENGLISH EDUCATION DEPARTMENT

FACULTY OF TARBIYAH AND TEACHERS TRAINING

STATE ISLAMIC UNIVERSITY

SYARIF HIDAYATULLAH

(2)

A ”Skripsi”

Presented to The Faculty Of Tarbiyah and Teacher Training in a Partial

Fulfillment of Requirements for the Degree of S.Pd (Bachelor of Arts) in

English Language Education

By:

Fifi Maghfiroh

NIM. 204014003208

Approved by Advisor:

Dr. Fahriany M.Pd

NIP. 1970 0611 1991 01 2001

ENGLISH EDUCATION DEPARTMENT

FACULTY OF TARBIYAH AND TEACHERS TRAINING

STATE ISLAMIC UNIVERSITY

SYARIF HIDAYATULLAH

(3)

i

ABSTRACT

Fifi Maghfiroh, 2010, An Item Analysis on the Difficulty Level of an English

Summative Test (A Case Study at Second Grade Students of SMP YMJ Ciputat), Skripsi, English Education Department, Faculty of Tarbiyah and Teachers Training, State Islamic University.

Advisor. Dr. Fahriany, M.Pd

Key word. Summative Test, Difficulty Level, Good Test

This research is aimed at measuring the difficulty level of each item of the English Summative Test (focused on the objective test) at the second grade students of SMP YMJ Ciputat at the odd semester of the 2009-2010 academic year. The writer uses a quantitative approach. The method of the study is a field research with visiting the school to do the research. This study also categorized as descriptive analysis because it is intended to describe the difficulty level objectively. By analyzing the students’ summative test paper, the writer finally knows, the English summative test item administered at the second grade students’ of SMP YMJ Ciputat qualified as a good test item seen from the level of difficulty.

The finding of the study states that this English Summative test items at the odd semester, administered at the second grade students of SMP YMJ Ciputat qualified as a good test. Based on the calculation of the difficulty level of the

items that the test belongs to the test items which have moderate level of difficulty,

(4)

ii

ABSTRAK

Fifi Maghfiroh, 2010, An Item Analysis on the Difficulty Level of an English

Summative Test (A Case Study at Second Grade Students of SMP YMJ Ciputat), Skripsi, pendidikan Bahasa Inggris, Fakultas Ilmu Tarbiyah dan Keguruan, Univerditas Islam Negeri Syarif Hidayatullah.

Pembimbing. Dr. Fahriany, M.Pd

Kata Kunci. Tes Summatif, tingkat Kesuliatan, Soal yang baik

Penelitian ini bertujuan untuk mengukur tingkat kesulitan butir-butir soal tes sumatif bahasa Inggris(terfokus pada butir-butir soal tes objektif) di SMP YMJ Ciputat semester gasal tahun akademik 2009-2010. Penulis menggunakan pendekatan kuantitatif. Metode dari penelitian ini adalah penelitian lapangan. Penelitian ini juga dikategorikan sebagai deskriptif analisis karena penelitian ini menggambarkan tingkat kesulitan tes sumatif secara objektif. Dengan menganalisa lembar jawaban tes sumatif, penulis akhirnya mengetahui bahwa butir-butir soal tes sumatif siswa kelas dua SMP YMJ Ciputat berkualitas sebagai soal yang baik dilihat dari tingkat kesulitannya.

Berdasarkan temuan di atas, butir-butir soal tes sumatif bahasa Inggris pada semester ganjil di kelas dua SMP YMJ Ciputat, berkualitas sebagai soal yang baik. Berdasarkan hasil perhitungan tingkat kesulitan butir-butir soal tes

bahwa soal tes ini memiliki tingkat kesulitan soal yang sedang atau “moderate”,

(5)

iii

ACKNOWLEDGMENT

ﻢﺴﺑ

ﷲا

ﻦﻤﺣﺮﻟا

ﻢﯿﺣﺮﻟا

In the name of Allah, the Beneficent, the Merciful

All praise be to Allah the Lord who has given mercy and blessing,

guidance, help and love until the writer can complete this ‘skripsi’, peace and

blessing is upon our prophet Muhammad S.A.W. his descendants, his companies

and his followers.

The primarily aim of this ‘skripsi’ is to complete a partial of requirements

of the degree of strata 1 (S1) for State Islamic University Jakarta, entitle “AN

ITEM ANALYSIS ON THE DIFFICULTY LEVEL OF AN ENGLISH

SUMMATIVE TEST (A Case Study at Second Grade Students of SMP YMJ

Ciputat)

In this occasion the writer would like to express her gratitude and her

honor to all people who helped her in finishing this ‘skripsi’. It must be for the

writer to say her acknowledgement sincerely to them for their help in completing

this skripsi..

First of all the writer would like to express her greatest gratitude to her

beloved mother (Mamah) and father (Sulaiman Fauzi), who have given their best

loving, guiding, scarifying, supporting the writer’s studying, and their praying in

every time both day and night, for the success of the writer. And also her brothers

and a sister who have been giving their motivation.

The writer also would like to give her great appreciation, honor and

gratitude to Dr. Fahriany M.Pd., as her advisor, for her time, guidance, kindness,

(6)

iv

Then the writer would like to give her special thanks to all lectures in English

Department, who have taught and given knowledge to the writer, whose names

cannot be mentioned one by one.

The writer realizes that she would not complete writing this ‘skripsi’ without the

help of people around her. Therefore, she would like to give her gratitude, and

appreciation to:

1. Drs. Syauki, M.Pd., the Head of English Department, Mrs. Neneng

Sunengsih, S.Pd., the Secretary of English Department, Ms. Aida and all

staff of English Department who helped the writer.

2. Prof. Dr. Dede Rosyada, MA., as the Dean of Faculty of Tarbiyah and

Teacher’s Training.

3. The Headmaster and all the teachers, staffs and employees of SMP YMJ

Ciputat, especially for Ms. Suryani, S.Pd, as English teacher and all

students in YMJ who permitted the writer to do the research.

4. The staffs and officers of the libraries whose book she used for the

references of this research, main library Syarif Hidayatullah State Islamic

University, library of Faculty of Tarbiyah and teacher Training, and Unika

Atma Jaya Library.

5. All of her friends at UIN, especially 2004 students at class A and B of

English Department. She thanks to their friendship especially, Yumi, Aini,

Nana, Ajiz, Fitri, as her best friends that the writer has ever had, thanks for

wonderful friendship and hope that all can make your dream come true.

6. Also the writer’s beloved aunt family, her cousin Reno Yose Rizal and all

the writer’s friends who care and always give support also help her

(7)

v

Finally, the writer realizes that this ‘skripsi’ is far for being perfect;

therefore, it is really a pleasure for her to receive suggestions and critics from

everyone for better writing.

Jakarta, October 5th 2010

(8)

vi

LIST OF TABLES

Table 3.1 Level of Difficulty... 26

(11)

ix

LIST OF APPENDIXES

1. The Group Position of the English Summative Test Result ... 51

2. Students’ Answer in the Upper Group ... 55

3. Students’ Answer in the Lower Group ... 59

4. Item Questions ... 63

5. Syllabus ... 80

6. Table of Conformity between the Summative Test’s Items and English Syllabus ... 92

(12)

1

speaker and because of the large number of non-native speaker who use it for part

at least of their international contact.1

Because English is so widely spoken, it has often been referred to as a

"world language", the lingua franca of the modern era. While English is not an

official language in most countries, it is currently the language most often taught

as a second language around the world. Some linguists (such as David Graddol)

believe that it is no longer the exclusive cultural property of "native English

speakers", but is rather a language that is absorbing aspects of cultures worldwide

as it continues to grow. It is, by international tread, the official language for area

and maritime communications. English is an official language of the United

Nations and many other international organizations, including the International

Olympic Committee.2

Based on the fact explained above, English language has an important

position, because of that English language becomes the first foreign language that

should taught to student in every level of education in Indonesia. Government and

1

Christopher Brumfit, English for International Communication (London: Pergamon Press, 1982), p. 1.

2

(13)

private institution are struggling to enhance teaching and learning process of

English in Indonesia. As that the compulsory foreign language subject it must be

learnt by students at school in Indonesia. It is given to the student from very early

age (preschool) up to university level.

Evaluation is an integral part of the instructional program. In educational

side, one of the most important aspects of teaching learning process is evaluation.

It contributes directly to the teaching and learning process, used in the classroom

instruction. The main focus of classroom evaluation is the pupil and their learning

focus.

Evaluation is the continuous inspection of all available information

concerning the student, teacher, educational program and the teaching-learning

process to ascertain the degree of change in students and form valid judgments

about the students and the effectiveness of the program.3

Through evaluation a teacher will be able to know his or her student

achievement on the materials that have been taught in a certain period of time.

And the teacher can measure his or her teaching effectiveness which has been

applied in the classroom.

There are many methods for collecting information or evaluation process.

One of them is by using a test. “Tes adalah suatu alat atau prosedur yang

sistematis dan objective untuk memperoleh data-data atau keterangan-keterangan yang diinginkan tentang seseorang, dengan cara yang boleh dikatakan tepat dan cepat.4

Test at school usually uses two kinds of test. There are formative test and

summative test, the formative test is usually made by teacher of each class of

school and given at the end of the lesson unit. And summative test is usually made

3

Charles D. Hopkins and Richard L. Antes, Classroom Measurement and Evaluation,

Third Edition, (Itasca: F. E. Peacock Publishers, Inc, 1990), p. 29

4

(14)

by a team, given at the end of each term or the end of the school year and it is held

in every school together in the same time.

As a mean to measure the students’ achievement of the learning process, a

test should be constructed well. So that it is able to distinguish between the

students who have studied well and they who have not.

In constructing the test, the teachers have to consider some of its criteria.

Each test, especially achievement test, has its own principle and approaches. Here

the teacher hoped to apply them as appropriately as they can.

After the teacher has administered and score the test. It is usually desirable

to evaluate the effectiveness of the test especially the test item. Because it is

necessary for the teachers to use their own judgment, as how well item usually

will work. This is done by studying the students’ responses of each item.

When formalized, the procedure is called item analysis. Nitko stated that

“Item analysis refers to the process of collecting, summarizing and using

information about individual test item, especially information about pupils’

response to the item.”5 “Item analysis usually concentrates on two vital features:

level of difficulties and discriminating power. The former means the percentage of

pupils who answer correctly each test item: the latter the ability of the test item to

differentiate between pupils who have done well and those who have done

poorly.”6

Based on the statement above, the writer is interested in analyzing the

English summative test items administered at the second year student of SMP

YMJ Ciputat seen from the level of difficulties.

5

Anthony J. Nitko, Educational test and Measurement, An Introduction, (New York: Harcourt Braco Jovanovich Inc, 1983), p. 8.

6

(15)

B. The Limitation of Problem

To make this writing easier to understand, the writer limits the study as

follow:

a. This writing is limited on the difficulty level of English summative test

item at the second grade students of SMP YMJ Ciputat.

b. The research is focused on the summative English at the second grade

effectiveness of distracter.7 The point of this discussion, the writer intends to see

the quality of test items only by doing item analysis that focus on level of

difficulty. The test item will be analyzed is an objective test of English summative

test used at second grade students of SMP YMJ Ciputat, and she formulates the

problem as follow:

“Are the English summative test items administered at the second grade students

of SMP YMJ Ciputat qualified as a good test item seen from the level of

difficulty?”

D. The Significance of Study

The result of this study are expected to give useful information about the

level of difficulty in English summative test item at the second grade students of

SMP YMJ Ciputat.

(16)

limitation and formulation of the problem of this research, the writer assumption’s

that, the test which have been tested in the second grade students at SMP YMJ

(17)

6

A. Evaluation and Test

Evaluation helps teachers to know his or her students achievement on the

materials that have been taught in a certain periods of time. So that the teacher can

measure his or her teaching effectiveness which has been applied in the

classroom, and test is one of the methods of doing evaluation.

1. Evaluation

a. The Definition of Evaluation

Evaluation is an integral part of the instructional program. In educational

side, one of the most important aspects of teaching learning process is evaluation.

It contributes directly to the teaching and learning process, used in the classroom

instruction. The main focus of classroom evaluation is the pupil and their learning

focus.

There are some definitions about the evaluation, there are: “Evaluation is

the continuous inspection of all available information concerning the student,

teacher, educational program. And thye teaching learning pprocess to ascertain the

degree of change in students and form valid judgements about the students and the

effectiveness of the program.”1

Based on the definition above evaluation is an important part of every

teaching and learning experience. It cannot be separated from the world of

education and teaching in general. All the education activities should be followed

by or go with an evaluation. It is considered that between teaching and evaluation

is like a two side of coin. That cannot be separated. Obviously, it contributes some

informations to the teaching learning process, especially for a teacher. It seems

1

(18)

awkward if a teaching process in the class never ends with evaluation. Without

evaluation teacher cannot report students’ outcome objectively.

Evaluation is defined as a systematic process of determining the extent to which instructional objectives are achieved by pupils. There are two important aspects of this definition. First, note that evaluation implies a systematic process, which omits casual uncontrolled observation of pupils. Second, evaluation assumes that instructional objectives have been

previously identified.2

Through evaluation a teacher will be able to know his or her student

achievement on the materials that have thought in a certain period of time. And

the teacher can measure his or her teaching effectiveness which has been applied

in the classroom.

“There are many methods to collect information or evaluation process.

One of them is by using a test. Test is a systematic and objective procedure to

obtain the data or information about the learner by an appropriate technique.”3

Test and evaluation is an integral part that stands together and cannot be separated

each other. Test as one of the methods in evaluation facilitate teacher to evaluate

students in comprehend all the previous material that have been taught

Evaluation means an activity of gathering information to be used in

making students and instructional decision. It must be done in systematic and

routine assessment. So that the data can help teacher understand the learners, plan

learning experiences for them and determine the extent to which the instructional

objectives are being achieve.

b. The Evaluation Planning

Basically, an evaluation requires planning to give each lesson or unit as

well. However, preparing an evaluation should be an integral part of the teacher.

Evaluation is needed to be planned because if the teacher does not plan it, the

items in the test will not relate to the lesson which has learnt by the students.

2

Norman, E. Gronlound, Measurement and Evaluation in Teaching, (New York: Macmillan Publishing Co., Inc., 1981), pp.5-6.

3

(19)

Moreover, if the evaluation is not planned, it will not be used, according to

Genesee, when planning evaluation, the following questions are relevant:

1) Who will use the result of assessment and for what purpose?

2) What will the teachers assess?

3) When will the teachers assess?

4) How will the teachers record the results of their assessment?4

So that, evaluation planning is important to do by the teacher, because this

planning related to the lesson which has been learn by the students.

c. The Uses of Educational Evaluation

Based on Ahmann, there are four uses of educational evaluation:

1) Appraisal of the academic achievement of individual student.

2) Diagnosis of the learning difficulties of an individual student or an

entire class.

3) Appraisal of the educational effectiveness of a curriculum,

instructional materials and procedures and organizational

arrangements.

4) Assessment of the educational progress of large population so as to

help understand educational problems and develop sound public

policy in education.5

Based on the explanation above, the evaluation examines students a unique

individual. Nevertheless, every individual of the students differ from each other.

Judgment may be compared with the earlier and the later data about them. Thus,

the result can be obtained concurrently.

d. Types of Evaluation

Evaluation procedures can be classified in terms of their functional role in

classroom instruction. One such classification system follows the sequence in

4

Fred Genesee, and John. A Upshur., Classroom based Evaluation in Second Language Education, (Cambridge University Press, 1996), p. 45.

5

(20)

which evaluation procedures are likely to be used in the classroom. These

categories classify the evaluation pupil performance in the following manner:

1. Placement evaluation

Placement evaluation is concerned with the pupil’s entry performance and

typically focus on question such as the following:

a) Does the pupil posses the knowledge and skills needed to begin

the planned instruction?

b) To what extent has the pupil already mastered the objectives of

planned instruction?

c) To what extent the pupil’s interests , work habits, and

personality characteristic indicate that one mode of instruction

might be better than another?.

2. Formative evaluation

Formative evaluation is used to monitor learning progress during

instruction.

3. Diagnostic evaluation

Diagnostic evaluation is highly specialized procedure. It is concerned with

the persistent or recurring learning difficulties that are left unresolved by the

standard corrective prescriptions of formative education.

4. Summative evaluation

Summative evaluation typically comes at the end of a course (unit) of instruction.

It is designed to determine the extent to which the instructional objectives have

been achieved and is used primarily assigning. 6

2. Test

We have to know that evaluation is an activity, which is done to get the

information of learning report and to be used in making educational purposes and

one of the method is a test.

6

(21)

A test may be defined as an activity whose main purpose is to convey (usually to the tester) how well the testee knows or can do something. This is in contrast to practice, whose main purpose is sheer learning. Learning may, of course, result from a test, just as feedback on knowledge may be one of the spin-offs of a practice activity: the

distinction is in the main goal.7

Based on the statement that have mentioned above, it can be conclude that

a test is a procedure designed to elicit score from which one can make inference

about a certain character of individual.

Different from definition above Genesse and Upshur said: “A test is, first

of all, about something. That is, it is about intelligence, or European history, or

second language proficiency. In educational terms, tests have subject matter or

content. Second, a test is a task or a set of tasks that elicits observable behavior

from the test taker. Third, tests yield scores that represent attributes or

characteristics of individual. In order to be meaningful, test score must have a

frame of reference. Test scores along with the frame of reference used to interpret

them is referred to as measurement. Thus, tests are a form of measurement.”8

Through the test, the teacher cannot only measure and motivate the

students’ ability but also improve the lesson in teaching learning process. In order

to make a proper decision, the teacher needs accurate data and to gain data, so a

good instrument is needed.

a. Kinds of Test

Test can be categorized accordingly to types of information it provides.

Based on the purpose of administering a test, test can be divided into four types of

test are: proficiency test, achievement test, diagnostic test, and aptitude test.9

7

Penny Ur, A Course in Language Teaching, Practice and Theory (Cambridge: Cambridge University Press, 1991), p. 33.

8

Fred Genesse and John A. Upshur, Classroom-Based…, p. 141.

9

(22)

1) Proficiency Test

Proficiency test are designed to measure people’s ability in a language

regardless of any training they may have had in that language. The content of a

proficiency test, therefore, is not base on the content or objectives of language

courses that people taking the test may have followed. Rather, it is based on the

specification of what candidates have to be able to do in the language in order to

be considered proficient. This raises the question of what we mean by the word

proficient.

In the case of some proficiency test, proficient means having sufficient

command of the language for a particular purpose. An example of this would be a

test designed to discover whether someone can function successfully as a United

Nations translator. Another example would be a test used to determine whether a

student English is good enough to follow a course of study at a British University.

Such a test may even attempt to take into account the level and the kind of English

needed to follow courses in particular subject area. It might, for example, have

one form of the test for art subject, another for science, and so on. Whatever the

particular purpose to which the language is to be put, this will be reflected in the

specification of test content at an early stage of a test’s development.10

The aim of a proficiency test is to assess the student’s ability to apply in

actual situations what he has learnt. It seeks to answer the question: ‘having learnt

this much, what can the student do with it? ‘This type of test is not usually related

to any particular course because it is concerned with the student’s current standing

in relation to his future needs. In this view of this future orientation, a proficiency

(23)

needs of any student will be to some extend specific, even if his intention is no

more than to use the language as a tourist.11

2) Achievement Tests

In contrast to proficiency tests, achievement test are directly related to

language courses, their purpose being to established individual students, groups of

students, or the courses themselves have been in achieving objectives.12 This

achievement test that also called an attainment or summative tests looks back over

a longer period of learning than the diagnostic test, for example a year’s work, or

a whole course, or even a variety of different courses. It is intended to show the

standard which the standard have now reached in relation to other students at the

same stage.13

They are of two kinds: final achievement tests and progress achievement

tests. Final achievement test are those administered at the end of a course of study.

They may be written and administered by ministries of education, official

examining boards, or by members of teaching institutions. Clearly the content of

these tests must be related to the courses with which they are concerned, but the

nature of this relationship is a matter of disagreement amongst language tester.

Progress achievement tests, are intended to measure the progress that student are

making.14

3) Diagnostic tests

Diagnostic tests are used to identify learners’ strengths and weakness.

They are intended primarily to ascertain what learning still needs to take place. At

the level of broad language skills this is reasonably.

11

(24)

The results of evaluation are intended to find the appropriate way to

improve learning and instruction. If pupil fails in a particular subject, a diagnosis

is needed.

“A diagnostic test is design to a particular aspect of a language. A

diagnostic test in pronunciation might have the purpose of determining which

phonological features of English are difficult for a learner.”15 Thus, diagnostic test

is much comprehensive and detail because it searches for the underlying causes of

learning difficulties and then formulate a plan for remedial action.

4) Aptitude Tests

“A language aptitude test is designed to measure a person’s capacity or

general ability to learn a foreign language and to be successful in that

undertaking.”16 Aptitude tests are often used to measure the suitability of a

candidate for a specific program of instruction or a particular kind of employment.

For this reason these tests are often synonymously with intelligence tests or

screening tests.17 Thus, these tests are given before the students begin to study and

to select them in section appropriate to their ability.

b. Types of Tests Item

Based on the manner of scoring, the type of tests item is divided into two

general types: Subjective and Objective tests.

1) Subjective Test

Subjective test is a test where in its scoring requires judgment and

valuation of the scorer. Hughes stated that: “if no judgment is required on the part

15

H. Douglass Brown, Teaching by Principles, An Interactive Approach Pedagogy,

(New York: Addison Wesley Longman, 2001), p. 390.

16

H. Douglass Brown, Teaching by …, p. 391.

17

(25)

of the scorer, the scoring is objective... if judgment is called for, the scoring is said

to be subjective.”18

In this type of test, the answer is usually in a form of composition where

the students given a freedom to relate their idea in their onwards. The subjective

tests that are usually used in classroom are essay, short answer and completion.

a) Essay

“The essay item is the most complex of supply type item. It demands that

the student compose a response, often extensive to a question for which no single

response or pattern of response can be cited as correct to the exclusion of all the

answer.”19 Thus, the distinctive feature of essay question is freedom of response it

provides. In answering the question, the students are given freedom to select,

relate, and present ideas in their own words. Because of the feature, the essay test

usually scored differently by the same person on different occasion.

b) Short Answer Question

The short answer item is a short essay item..., and are best suited for

questions requirin a brief response –a word, a phrase, or a sentence. While short

answer items are typically used for knowledge objectives, and essay items are

most appropriate for systhesis and evaluation outcomes, short answer items can

easily be used for higher-order outcomes.20 Thus, when the teachers are going to

know the broader description about something, they are better to use the essay

form.

c) Completion

The completion item is a written statement that requires the examinee to

supply the correct word or short phrase in response to an incomplete sentence, a

(26)

question, or a word association. Completion test can be used effectively to

measure the recall of term, dates and names.

This type of test can be used at almost all levels. But it is extremely

difficult to phrase the question or incomplete statement so that only one answer is

correct. And in making the question, it may not too many clues are given, the

items will be too easy, and if an insufficient number of clues are presented, the

item will be ambiguous and may yield several possibility of correct answer.21

2) Objective test

“Objective tests are frequently criticized on the grounds that they are

simpler to answer than subjective tests. Items in an objective test, however, can be

made just an easy or as difficult as the test constructor wisher.”22 While Gay said,

“Objective tests are sometimes criticized on the basis that they are appropriate for

measuring knowledge-level outcome only.”23 Therefore, whether one teacher or

another scores the item, today of last week, it will yield the same score.

Based on the description above, an objective test is a test that has right or

wrong answers and so can be marked objectively. It can be compared with a

subjective test, which is evaluated by giving an opinion, usually based on agreed

criteria. Objective tests are popular because they are easy to prepare and take,

quick to mark, and provide a quantifiable and concrete result.

The objective test items commonly used in classroom testing are true false,

matching, and multiple choices.

21

Wilmar Tinambunan, Evaluation of…, p. 61.

22

J.B. Heaton, Writing English Language Tests, New Edition (London and New York: Longman,1988), p. 26.

23

(27)

a) True False

True false item common used in measuring the ability to identify the

correctnes of statement of fact, defrinition of term, statements of principle and the

like..24

True false item doesn’t directly test writing or speaking abilities: only

listening and reading. It may be used to test aspects of language such as

vocabulary, grammar, content of reading or listening passage, it is fairly easy to

design, it is also easy to administer, whether orally or in writing, and to mark.25

Thus, the item provides the students with a choice of two alternatives, so

the students have a possibility to guess the answer and sometimes it will be the

right answer and sometime it will be wrong answer. Because of the random

guessing to produce the correct answer. This type of test usually construct by

statement that the students have to choose whether it is true of false statement. If

the statement is true, the students should write it with ‘T’, and if it is false, they

must be write it with ‘F’.

b) Matching

The matching exercise consist of two parallel column of phrase, words,

numbers, or symbols that mus be matched. Example of items included in

matching execises are person and achievement, dates and historical events. The

nature of mathing exercise limits it to measuring the ability to identify the

Norman E Grondlund and Robert L. Linn, Measurement and Assessment in Teaching,

(New Jersey: Prentice Hall, Inc, 1995), p. 150

25

Penny Ur, A Course …, p. 39.

26

(28)

c) Multiple Choice

A multiple choice item consists of one or more introductory sentences

followed by a list of two or more suggested responses from which the examinee

chooses one as the correct answer.27

The multiple choices item can measure a variety of learning outcomes

from simple to complex, and it is adaptable to most types of subject matter

content. The learning outcomes in the knowledge area that can be measured by the

multiple choice items are:28

i. Knowledge of terminology

For this purpose, pupils are requested to show their knowledge

of a particular term by selecting a word that has the same

meaning as the given term or by choosing a definition of the

term. Special uses of term can also be measured by having

pupils identify the meaning of the term when used in context.

ii. Knowledge of specific facts.

It is important in its own right, and it provides a necessary basis

for developing understanding, thinking skills, and other

complex learning outcomes. Multiple choice items designed to

measure specific facts can take many different forms, but

more difficult, this is because principles are more complex than

isolated facts.

iv. Knowledge of method and procedure

27

Anthony J. Nitko, Educational test and Measurement, An Introduction (New York: Harcourt Braco Jovanovich Inc, 1983) , p. 190.

28

(29)

The multiple choice form is also able to measure the

knowledge of method and procedure, such as knowledge of

laboratory procedure, knowledge of methods used in problem

solving, computational and performance skill.

Some advantages of using multiple choice items are: the multiple choice

items are fast, easy and economical to score, they can be objectively so that they

will be fairer and more reliable than subjectively scored tests.

Besides those advantages, the multiple choice it’s also have disadvantages

such as: the technique of the test only recognition knowledge, so the students have

no or little opportunity to express their own idea of a problem, pupils have much

time to guess the answer and it may effect on their scores, it is difficult to write

successful items and cheating may be facilitated.29

This type of test has advantages and disadvantages. The advantages of this

test are related to the teacher measure the student. It helps them to give the score

objectively. But in other side it also has disadvantages that related student, that

only measure the students’ knowledge and make the students easy to cheat each

other.

B. Item Analysis

Selection of appropriate language items is not enough by itself to ensure a

good test. Each question need to function properly: otherwise, it can weaken the

exam. Fortunately, there are some rather simple statistical ways of checking

individual items. This is done by studying the students’ responses to each item.

According to Nitko: “item analysis refers to the process of collecting,

summarizing and using information about individuals test items, especially

information about pupils’ responses to items.30 The analysis of students’ response

29

Kathleen M. Bailey, Learning About Language Assessment (Boston: ITP An International Thomson Publishing Company, 1988), p. 131.

30

(30)

of objective test items is a powerful tool for test improvement. Ahmann and Glock

said “Item analysis is reexamining each test to discover its strength and flaw.”31

From those opinions, it can be conclude that item analysis is the process of

collecting information about students’ responses to the items to see the quality of

the test items. More specific, item analysis information can tell us if an item was

too easy or too hard. Item analysis data also aid in detecting specific technical

flaws and thus further provides information for improving test items.

According to James Dean Brown and Thom Hudson, “Item analysis is the

systematic statistical evaluation of the effectiveness of individual test item.”32

Item analysis as a whole will be defined here as the systematically

statistical evaluation of the effectiveness of individual test items. Items analysis is

usually done for purposes of selecting which items will remain on future revised

and improved versions of test. Sometime, however, item analysis is performed

simply to investigate how well the items on a test are working with a particular

group of students, or to study which items match the language domain of interest.

C. Kinds of Item Analysis

There are three characteristics usually considered in the test and

measurement, they are:

1. Difficulty Level

Level of difficulty can be identified by selecting the test with percentage

of the correct answer. According to Harrison: “Level of difficulty means the

percentage of students who give the right answer.”33

“A good test should have certain degree of difficulty. It may not too easy

or too difficult, because the test that is too easy or too difficult for the group tested

yield score distribution that makes it hard to identify reliable differences in

achievement levels between members of the group.”34 The level of difficulty is a

31

Ahmann, J. Stanley, and D. Glock, Marving, Evaluating Student…, p. 184.

32

James Dean Brown and Thom Hudson, Criterion-Referenced Language testing, (New York: Cambridge University Press, 2002), p. 113.

33

Andrew Harrison, A Language …, p. 128.

34

(31)

percentage of students who answer correctly of the item test. And a good test must

be having an appropriate degree of difficulty. So that by analyzing the students’

response to the items, the level of difficulty of each item can be known and the

information will be helpful for the teacher in identifying concepts to rethought the

study material and giving the student feedback about their learning.

Item difficulty goes by many other names; item facility, item easiness,

p-value, or abbreviated simply as IF.35 To make easier in computing the level of

difficulty, the writer divides the students into three groups. They are upper,

middle, and lower groups. Upper and lower group are be focused in analysis and

the middle group is aside.

The formula for computing item difficulty is as follows:

Where:

FV : Facility value or item difficulty that we are looking for

U : Sum of students from the upper group who answer

correctly

L : Sum of students from the lower group who answer

correctly

2n : Total sum of students in upper and lower group.36

Based on the techniques above, to find out the difficulty level of all the

items in the test by following formula:

Where:

35

James Dean Brown and Thom Hudson, Criterion-Referenced…, p. 114.

(32)

P : Difficulty level of all items.

b : Difficulty level of each item

∑ ; Sigma (Total)

N : Total number of test items.37

Score “FV” (Facility value or item difficulty that we are looking for) and

“P” (difficulty level of all items) can be ranged from 0.00 to 1.00. If “FV” or “P”

is less than 0.30, it means almost the student from upper and lower groups cannot

answer the item test correctly (these items belong to difficult one). If “FV” or “P”

is 0.30 - 0.70, it means the proportion of students answering correctly is about

halfway between a chance value and the point where no student misses the item

(these items belong to moderate one). And if “FV” or “P” is more than 0.70, it

means almost the students from upper and lower group can answer the item test

correctly (these items belong to very easy one).

The level of difficulty shows the easiness or difficultness of item test for

that group. So the level of difficulty is influenced by the students’ competence. It

will be different if the test is given to another group.

2. Discriminating Power

The discriminating power of a test item is its ability to differentiate

between pupils who have achieved well (the upper group) and those who have

achieved poorly (the lower group).38 Students with high scores on the test (the

upper group) answered the item correctly more frequently than student with low

scores on the test (the lower group). If the test items given to the students who

have studied well, the score will be high and if they are given to those who have

not, the score will be low. On the contrary, if the test items yield the same score

37

Asmawi zainul dan Noehi Nasution, Penilaian Hasil Belajar, (Jakarta: PAU-PPAI, UT, 1993), p. 153.

38

(33)

when they are given to the two groups, or even to the upper group yield the low

score and to the lower group yield the high score, so they are not good test items.

Effective and ineffective distracters can be identified from analysis, and

those which are not working as planned can be rewritten or replaced. A change in

alternatives for a multiple choice item can increase discrimination.

The formula is as follows:

Where:

DP : the index of item discriminating power.

U : the number of students in the upper group who answered

the item correctly

L : the number of students in the lower group who answered

the item correctly

T : total number of students in upper and lower group.39

Item discrimination statistic is calculated by subtracting the number of

students in the upper group who answered the item correctly from the number of

students in the lower group who answered the item correctly then it is divide by

half of total number of students in upper and lower group.

3. Distracter Effectiveness

A good distracter will attract more students who have not studied well (the

lower group) than the upper group. On the contrary, a weak distracter will not be

selected by any of the lower achieving students.

39

Ngalim Purwanto, Prinsip-prinsip dan Teknik Evaluasi Pengajaran, (Bandung: Remaja Rosdakarya, 1986), p.120.

(34)

One important aspect affecting the difficulty of multiple choice test items

is the quality of distracters. Some distracters, in fact, might not be distracting at

all, and therefore serve no purpose.40 Because the parts of multiple choice items

include the item stem, or the main part of the item at the top, the options, which

are the alternative choices presented to the student, the correct answer, which is

the option that will be counted as correct, and the distracters, which are the

options that will be counted as incorrect.41

In a good test item, the distracters must be functioned effectively, if the

distracter are not functioned, they should be rewritten or discarded. Distracter

analysis is done by comparing the number of students in the upper group and the

lower group who select each incorrect alternative.

D. The Importance of Item Analysis

The result of item analysis can be used to select items of desired difficulty

that best discriminate between high and low achieving students. However the

results of an item analysis can be useful in identifying faulty items and can

provide information about student misconception and topics that need additional

work.42

The benefits of item analysis are not limited to the improvement of

individual test items; however there are a number of fringe benefits of special

value to classroom teachers. The most important of these are the following:

1. Item analysis data provide a basis for efficient class discussion of

the test result.

2. Item analysis data provide a basis for remedial work.

3. Item analysis data provide a basis for the general improvement of

classroom instruction.

40

Nana Sujana, Penilaian Hasil Proses Belajar Mengajar, (Bandung: Remaja Rosdakarya, 2001), p. 141.

41

James Dean Brown, Testing in Language Programs, (New Jersey: Prentice Hall Regents, 1996), p.70.

42

(35)

4. Item analysis procedures provide a basis for increased skill in test

construction.43

Based on the statement above there are so much benefit that teacher could find in

doing analyze in items. All the benefits are related to the achievement in students’

score.

While Nitko states in his book, the important of item analysis are:

Determining whether an item functions as the teacher intends, feedback to

students about their performance and as basis for class discussion, feedback to the

teacher about pupil difficulties, areas for curriculum improvement, revising the

items, improving item writing skills.44

So that, item analysis should be done by the teacher, because with doing analysis,

that facilitate teachers in measuring students, motivate students to studied well,

active in every class performance and make teacher easy to measure the item

question that was good or not.

43

Norman E Grondlund and Robert L. Linn, Measurement and…, p. 316.

44

(36)

25

A. The Objective of The Study

The objective of this study is to measure the quality of difficulty level of

English Summative test items at odd semester of the second grade students at

SMP YMJ Ciputat, specifically in difficulty level of each item. This research is

regarded as a field for the writer wish to widen her or his knowledge both

theoretically and practically about testing, specifically about difficulty level of

the test items.

B. Place and Time of The Study

The research was conducted at SMP Yayasan Miftahul Jannah (YMJ)

Ciputat. This is located at Jl. Limun No. 27 Ciputat Tangerang Selatan. The

writer did the research from January 5th up to February 10, 2010. The writer took

the English Summative test question paper and the students’ answer sheet at odd

semester of the second grade students at SMP YMJ Ciputat.

C. Research Method

In this research the writer used quantitative analysis technique. The data

are calculated using simple percentage formula. It is used to find out difficulty

level of each item of English Summative test at odd semester of the second grade

students at SMP YMJ Ciputat, Academic year 2009-2010.

D. Research Instrument

1) The students’ answer sheet

The students’ answer sheet is papers in which students give their answer

(37)

2) English summative test question paper at odd semester of the second

grade students at SMP YMJ Ciputat, 2009-2010 academic year, which

was conducted on Wednesday, December 9th, 2009. Started at 7.30 up

to 9.30 a.m.

E. The Techniques of Data Analysis

In this research the writer used quantitative method. To analyze the level

of difficulty of each item in the English summative test, at odd semester of the

second grade students at SMP YMJ Ciputat.

To count the difficulty level, the writer uses formula from J.B. Heaton as

follows:

FV = Correct U + Correct L

N

Where:

FV : Facility value or item difficulty that we are looking for

U : Sum of students from the upper group who answer correctly

L : Sum of students from the lower group who answer correctly

N : Total sum of students in upper and lower group.1

Based on the techniques above, the writer tries to find out the difficulty

level of all the items in the English Summative at odd semester tested at second

year students of SPM YMJ Ciputat, 2009-2010 academic year, by following

formula:

Where:

P : Difficulty level of all items.

B : Difficulty level of each item

1

J.B. Heaton, Writing English…, p.182

P = ∑ b

(38)

∑ : Sigma (Total)

N : Total number of test items.2