A ”Skripsi”
Presented to The Faculty Of Tarbiyah and Teacher Training in a Partial
Fulfillment of Requirements for the Degree of S.Pd (Bachelor of Arts) in
English Language Education
By:
Fifi Maghfiroh
NIM. 204014003208
ENGLISH EDUCATION DEPARTMENT
FACULTY OF TARBIYAH AND TEACHERS TRAINING
STATE ISLAMIC UNIVERSITY
SYARIF HIDAYATULLAH
A ”Skripsi”
Presented to The Faculty Of Tarbiyah and Teacher Training in a Partial
Fulfillment of Requirements for the Degree of S.Pd (Bachelor of Arts) in
English Language Education
By:
Fifi Maghfiroh
NIM. 204014003208
Approved by Advisor:
Dr. Fahriany M.Pd
NIP. 1970 0611 1991 01 2001
ENGLISH EDUCATION DEPARTMENT
FACULTY OF TARBIYAH AND TEACHERS TRAINING
STATE ISLAMIC UNIVERSITY
SYARIF HIDAYATULLAH
i
ABSTRACT
Fifi Maghfiroh, 2010, An Item Analysis on the Difficulty Level of an English
Summative Test (A Case Study at Second Grade Students of SMP YMJ Ciputat), Skripsi, English Education Department, Faculty of Tarbiyah and Teachers Training, State Islamic University.
Advisor. Dr. Fahriany, M.Pd
Key word. Summative Test, Difficulty Level, Good Test
This research is aimed at measuring the difficulty level of each item of the English Summative Test (focused on the objective test) at the second grade students of SMP YMJ Ciputat at the odd semester of the 2009-2010 academic year. The writer uses a quantitative approach. The method of the study is a field research with visiting the school to do the research. This study also categorized as descriptive analysis because it is intended to describe the difficulty level objectively. By analyzing the students’ summative test paper, the writer finally knows, the English summative test item administered at the second grade students’ of SMP YMJ Ciputat qualified as a good test item seen from the level of difficulty.
The finding of the study states that this English Summative test items at the odd semester, administered at the second grade students of SMP YMJ Ciputat qualified as a good test. Based on the calculation of the difficulty level of the
items that the test belongs to the test items which have moderate level of difficulty,
ii
ABSTRAK
Fifi Maghfiroh, 2010, An Item Analysis on the Difficulty Level of an English
Summative Test (A Case Study at Second Grade Students of SMP YMJ Ciputat), Skripsi, pendidikan Bahasa Inggris, Fakultas Ilmu Tarbiyah dan Keguruan, Univerditas Islam Negeri Syarif Hidayatullah.
Pembimbing. Dr. Fahriany, M.Pd
Kata Kunci. Tes Summatif, tingkat Kesuliatan, Soal yang baik
Penelitian ini bertujuan untuk mengukur tingkat kesulitan butir-butir soal tes sumatif bahasa Inggris(terfokus pada butir-butir soal tes objektif) di SMP YMJ Ciputat semester gasal tahun akademik 2009-2010. Penulis menggunakan pendekatan kuantitatif. Metode dari penelitian ini adalah penelitian lapangan. Penelitian ini juga dikategorikan sebagai deskriptif analisis karena penelitian ini menggambarkan tingkat kesulitan tes sumatif secara objektif. Dengan menganalisa lembar jawaban tes sumatif, penulis akhirnya mengetahui bahwa butir-butir soal tes sumatif siswa kelas dua SMP YMJ Ciputat berkualitas sebagai soal yang baik dilihat dari tingkat kesulitannya.
Berdasarkan temuan di atas, butir-butir soal tes sumatif bahasa Inggris pada semester ganjil di kelas dua SMP YMJ Ciputat, berkualitas sebagai soal yang baik. Berdasarkan hasil perhitungan tingkat kesulitan butir-butir soal tes
bahwa soal tes ini memiliki tingkat kesulitan soal yang sedang atau “moderate”,
iii
ACKNOWLEDGMENT
ﻢﺴﺑ
ﷲا
ﻦﻤﺣﺮﻟا
ﻢﯿﺣﺮﻟا
In the name of Allah, the Beneficent, the Merciful
All praise be to Allah the Lord who has given mercy and blessing,
guidance, help and love until the writer can complete this ‘skripsi’, peace and
blessing is upon our prophet Muhammad S.A.W. his descendants, his companies
and his followers.
The primarily aim of this ‘skripsi’ is to complete a partial of requirements
of the degree of strata 1 (S1) for State Islamic University Jakarta, entitle “AN
ITEM ANALYSIS ON THE DIFFICULTY LEVEL OF AN ENGLISH
SUMMATIVE TEST (A Case Study at Second Grade Students of SMP YMJ
Ciputat)
In this occasion the writer would like to express her gratitude and her
honor to all people who helped her in finishing this ‘skripsi’. It must be for the
writer to say her acknowledgement sincerely to them for their help in completing
this skripsi..
First of all the writer would like to express her greatest gratitude to her
beloved mother (Mamah) and father (Sulaiman Fauzi), who have given their best
loving, guiding, scarifying, supporting the writer’s studying, and their praying in
every time both day and night, for the success of the writer. And also her brothers
and a sister who have been giving their motivation.
The writer also would like to give her great appreciation, honor and
gratitude to Dr. Fahriany M.Pd., as her advisor, for her time, guidance, kindness,
iv
Then the writer would like to give her special thanks to all lectures in English
Department, who have taught and given knowledge to the writer, whose names
cannot be mentioned one by one.
The writer realizes that she would not complete writing this ‘skripsi’ without the
help of people around her. Therefore, she would like to give her gratitude, and
appreciation to:
1. Drs. Syauki, M.Pd., the Head of English Department, Mrs. Neneng
Sunengsih, S.Pd., the Secretary of English Department, Ms. Aida and all
staff of English Department who helped the writer.
2. Prof. Dr. Dede Rosyada, MA., as the Dean of Faculty of Tarbiyah and
Teacher’s Training.
3. The Headmaster and all the teachers, staffs and employees of SMP YMJ
Ciputat, especially for Ms. Suryani, S.Pd, as English teacher and all
students in YMJ who permitted the writer to do the research.
4. The staffs and officers of the libraries whose book she used for the
references of this research, main library Syarif Hidayatullah State Islamic
University, library of Faculty of Tarbiyah and teacher Training, and Unika
Atma Jaya Library.
5. All of her friends at UIN, especially 2004 students at class A and B of
English Department. She thanks to their friendship especially, Yumi, Aini,
Nana, Ajiz, Fitri, as her best friends that the writer has ever had, thanks for
wonderful friendship and hope that all can make your dream come true.
6. Also the writer’s beloved aunt family, her cousin Reno Yose Rizal and all
the writer’s friends who care and always give support also help her
v
Finally, the writer realizes that this ‘skripsi’ is far for being perfect;
therefore, it is really a pleasure for her to receive suggestions and critics from
everyone for better writing.
Jakarta, October 5th 2010
vi
TABLE OF CONTENTS
Page
ABSTRACT ... i
ABSTRAK ... ii
ACKNOWLEDGEMENT ... iii
TABLE OF CONTENTS ... vi
LIST OF TABLES ... viii
LIST OF APPENDIXES ... ix
CHAPTER I : INTRODUCTION A. The Background of Study ... 1
B. The Limitation of Problem ... 4
C. The Formulation of Problem ... 4
D. The Significance of Study ... 4
E. The Assumptions ... 4
CHAPTER II : THEORETICAL FRAMEWORK A. Evaluation and Test ... 6
1. Evaluation ... 6
2. Test ... 9
a. Kinds of Test ... 10
b. Types of Tests Item ... 13
B. Item Analysis ... 18
C. Kinds of Item Analysis ... 19
D. The Importance of Item Analysis ... 23
vii
B. Place and Time of The Study ... 25
C. Research Method ... 25
D. Research Instrument ... 25
E. The Techniques of Data Analysis ... 26
CHAPTER IV : RESEARCH FINDING AND DISCUSSION A. Description of Data ... 28
B. Analysis of Data ... 28
C. Interpretation of Data ... 35
CHAPTER IV : CONCLUSIONS AND SUGGESTIONS A. Conclusions ... 39
B. Suggestions ... 40
BIBLIOGRAPHY ... 41
viii
LIST OF TABLES
Table 3.1 Level of Difficulty... 26
ix
LIST OF APPENDIXES
1. The Group Position of the English Summative Test Result ... 51
2. Students’ Answer in the Upper Group ... 55
3. Students’ Answer in the Lower Group ... 59
4. Item Questions ... 63
5. Syllabus ... 80
6. Table of Conformity between the Summative Test’s Items and English Syllabus ... 92
1
speaker and because of the large number of non-native speaker who use it for part
at least of their international contact.1
Because English is so widely spoken, it has often been referred to as a
"world language", the lingua franca of the modern era. While English is not an
official language in most countries, it is currently the language most often taught
as a second language around the world. Some linguists (such as David Graddol)
believe that it is no longer the exclusive cultural property of "native English
speakers", but is rather a language that is absorbing aspects of cultures worldwide
as it continues to grow. It is, by international tread, the official language for area
and maritime communications. English is an official language of the United
Nations and many other international organizations, including the International
Olympic Committee.2
Based on the fact explained above, English language has an important
position, because of that English language becomes the first foreign language that
should taught to student in every level of education in Indonesia. Government and
1
Christopher Brumfit, English for International Communication (London: Pergamon Press, 1982), p. 1.
2
private institution are struggling to enhance teaching and learning process of
English in Indonesia. As that the compulsory foreign language subject it must be
learnt by students at school in Indonesia. It is given to the student from very early
age (preschool) up to university level.
Evaluation is an integral part of the instructional program. In educational
side, one of the most important aspects of teaching learning process is evaluation.
It contributes directly to the teaching and learning process, used in the classroom
instruction. The main focus of classroom evaluation is the pupil and their learning
focus.
Evaluation is the continuous inspection of all available information
concerning the student, teacher, educational program and the teaching-learning
process to ascertain the degree of change in students and form valid judgments
about the students and the effectiveness of the program.3
Through evaluation a teacher will be able to know his or her student
achievement on the materials that have been taught in a certain period of time.
And the teacher can measure his or her teaching effectiveness which has been
applied in the classroom.
There are many methods for collecting information or evaluation process.
One of them is by using a test. “Tes adalah suatu alat atau prosedur yang
sistematis dan objective untuk memperoleh data-data atau keterangan-keterangan yang diinginkan tentang seseorang, dengan cara yang boleh dikatakan tepat dan cepat.4
Test at school usually uses two kinds of test. There are formative test and
summative test, the formative test is usually made by teacher of each class of
school and given at the end of the lesson unit. And summative test is usually made
3
Charles D. Hopkins and Richard L. Antes, Classroom Measurement and Evaluation,
Third Edition, (Itasca: F. E. Peacock Publishers, Inc, 1990), p. 29
4
by a team, given at the end of each term or the end of the school year and it is held
in every school together in the same time.
As a mean to measure the students’ achievement of the learning process, a
test should be constructed well. So that it is able to distinguish between the
students who have studied well and they who have not.
In constructing the test, the teachers have to consider some of its criteria.
Each test, especially achievement test, has its own principle and approaches. Here
the teacher hoped to apply them as appropriately as they can.
After the teacher has administered and score the test. It is usually desirable
to evaluate the effectiveness of the test especially the test item. Because it is
necessary for the teachers to use their own judgment, as how well item usually
will work. This is done by studying the students’ responses of each item.
When formalized, the procedure is called item analysis. Nitko stated that
“Item analysis refers to the process of collecting, summarizing and using
information about individual test item, especially information about pupils’
response to the item.”5 “Item analysis usually concentrates on two vital features:
level of difficulties and discriminating power. The former means the percentage of
pupils who answer correctly each test item: the latter the ability of the test item to
differentiate between pupils who have done well and those who have done
poorly.”6
Based on the statement above, the writer is interested in analyzing the
English summative test items administered at the second year student of SMP
YMJ Ciputat seen from the level of difficulties.
5
Anthony J. Nitko, Educational test and Measurement, An Introduction, (New York: Harcourt Braco Jovanovich Inc, 1983), p. 8.
6
B. The Limitation of Problem
To make this writing easier to understand, the writer limits the study as
follow:
a. This writing is limited on the difficulty level of English summative test
item at the second grade students of SMP YMJ Ciputat.
b. The research is focused on the summative English at the second grade
effectiveness of distracter.7 The point of this discussion, the writer intends to see
the quality of test items only by doing item analysis that focus on level of
difficulty. The test item will be analyzed is an objective test of English summative
test used at second grade students of SMP YMJ Ciputat, and she formulates the
problem as follow:
“Are the English summative test items administered at the second grade students
of SMP YMJ Ciputat qualified as a good test item seen from the level of
difficulty?”
D. The Significance of Study
The result of this study are expected to give useful information about the
level of difficulty in English summative test item at the second grade students of
SMP YMJ Ciputat.
limitation and formulation of the problem of this research, the writer assumption’s
that, the test which have been tested in the second grade students at SMP YMJ
6
A. Evaluation and Test
Evaluation helps teachers to know his or her students achievement on the
materials that have been taught in a certain periods of time. So that the teacher can
measure his or her teaching effectiveness which has been applied in the
classroom, and test is one of the methods of doing evaluation.
1. Evaluation
a. The Definition of Evaluation
Evaluation is an integral part of the instructional program. In educational
side, one of the most important aspects of teaching learning process is evaluation.
It contributes directly to the teaching and learning process, used in the classroom
instruction. The main focus of classroom evaluation is the pupil and their learning
focus.
There are some definitions about the evaluation, there are: “Evaluation is
the continuous inspection of all available information concerning the student,
teacher, educational program. And thye teaching learning pprocess to ascertain the
degree of change in students and form valid judgements about the students and the
effectiveness of the program.”1
Based on the definition above evaluation is an important part of every
teaching and learning experience. It cannot be separated from the world of
education and teaching in general. All the education activities should be followed
by or go with an evaluation. It is considered that between teaching and evaluation
is like a two side of coin. That cannot be separated. Obviously, it contributes some
informations to the teaching learning process, especially for a teacher. It seems
1
awkward if a teaching process in the class never ends with evaluation. Without
evaluation teacher cannot report students’ outcome objectively.
Evaluation is defined as a systematic process of determining the extent to which instructional objectives are achieved by pupils. There are two important aspects of this definition. First, note that evaluation implies a systematic process, which omits casual uncontrolled observation of pupils. Second, evaluation assumes that instructional objectives have been
previously identified.2
Through evaluation a teacher will be able to know his or her student
achievement on the materials that have thought in a certain period of time. And
the teacher can measure his or her teaching effectiveness which has been applied
in the classroom.
“There are many methods to collect information or evaluation process.
One of them is by using a test. Test is a systematic and objective procedure to
obtain the data or information about the learner by an appropriate technique.”3
Test and evaluation is an integral part that stands together and cannot be separated
each other. Test as one of the methods in evaluation facilitate teacher to evaluate
students in comprehend all the previous material that have been taught
Evaluation means an activity of gathering information to be used in
making students and instructional decision. It must be done in systematic and
routine assessment. So that the data can help teacher understand the learners, plan
learning experiences for them and determine the extent to which the instructional
objectives are being achieve.
b. The Evaluation Planning
Basically, an evaluation requires planning to give each lesson or unit as
well. However, preparing an evaluation should be an integral part of the teacher.
Evaluation is needed to be planned because if the teacher does not plan it, the
items in the test will not relate to the lesson which has learnt by the students.
2
Norman, E. Gronlound, Measurement and Evaluation in Teaching, (New York: Macmillan Publishing Co., Inc., 1981), pp.5-6.
3
Moreover, if the evaluation is not planned, it will not be used, according to
Genesee, when planning evaluation, the following questions are relevant:
1) Who will use the result of assessment and for what purpose?
2) What will the teachers assess?
3) When will the teachers assess?
4) How will the teachers record the results of their assessment?4
So that, evaluation planning is important to do by the teacher, because this
planning related to the lesson which has been learn by the students.
c. The Uses of Educational Evaluation
Based on Ahmann, there are four uses of educational evaluation:
1) Appraisal of the academic achievement of individual student.
2) Diagnosis of the learning difficulties of an individual student or an
entire class.
3) Appraisal of the educational effectiveness of a curriculum,
instructional materials and procedures and organizational
arrangements.
4) Assessment of the educational progress of large population so as to
help understand educational problems and develop sound public
policy in education.5
Based on the explanation above, the evaluation examines students a unique
individual. Nevertheless, every individual of the students differ from each other.
Judgment may be compared with the earlier and the later data about them. Thus,
the result can be obtained concurrently.
d. Types of Evaluation
Evaluation procedures can be classified in terms of their functional role in
classroom instruction. One such classification system follows the sequence in
4
Fred Genesee, and John. A Upshur., Classroom based Evaluation in Second Language Education, (Cambridge University Press, 1996), p. 45.
5
which evaluation procedures are likely to be used in the classroom. These
categories classify the evaluation pupil performance in the following manner:
1. Placement evaluation
Placement evaluation is concerned with the pupil’s entry performance and
typically focus on question such as the following:
a) Does the pupil posses the knowledge and skills needed to begin
the planned instruction?
b) To what extent has the pupil already mastered the objectives of
planned instruction?
c) To what extent the pupil’s interests , work habits, and
personality characteristic indicate that one mode of instruction
might be better than another?.
2. Formative evaluation
Formative evaluation is used to monitor learning progress during
instruction.
3. Diagnostic evaluation
Diagnostic evaluation is highly specialized procedure. It is concerned with
the persistent or recurring learning difficulties that are left unresolved by the
standard corrective prescriptions of formative education.
4. Summative evaluation
Summative evaluation typically comes at the end of a course (unit) of instruction.
It is designed to determine the extent to which the instructional objectives have
been achieved and is used primarily assigning. 6
2. Test
We have to know that evaluation is an activity, which is done to get the
information of learning report and to be used in making educational purposes and
one of the method is a test.
6
A test may be defined as an activity whose main purpose is to convey (usually to the tester) how well the testee knows or can do something. This is in contrast to practice, whose main purpose is sheer learning. Learning may, of course, result from a test, just as feedback on knowledge may be one of the spin-offs of a practice activity: the
distinction is in the main goal.7
Based on the statement that have mentioned above, it can be conclude that
a test is a procedure designed to elicit score from which one can make inference
about a certain character of individual.
Different from definition above Genesse and Upshur said: “A test is, first
of all, about something. That is, it is about intelligence, or European history, or
second language proficiency. In educational terms, tests have subject matter or
content. Second, a test is a task or a set of tasks that elicits observable behavior
from the test taker. Third, tests yield scores that represent attributes or
characteristics of individual. In order to be meaningful, test score must have a
frame of reference. Test scores along with the frame of reference used to interpret
them is referred to as measurement. Thus, tests are a form of measurement.”8
Through the test, the teacher cannot only measure and motivate the
students’ ability but also improve the lesson in teaching learning process. In order
to make a proper decision, the teacher needs accurate data and to gain data, so a
good instrument is needed.
a. Kinds of Test
Test can be categorized accordingly to types of information it provides.
Based on the purpose of administering a test, test can be divided into four types of
test are: proficiency test, achievement test, diagnostic test, and aptitude test.9
7
Penny Ur, A Course in Language Teaching, Practice and Theory (Cambridge: Cambridge University Press, 1991), p. 33.
8
Fred Genesse and John A. Upshur, Classroom-Based…, p. 141.
9
1) Proficiency Test
Proficiency test are designed to measure people’s ability in a language
regardless of any training they may have had in that language. The content of a
proficiency test, therefore, is not base on the content or objectives of language
courses that people taking the test may have followed. Rather, it is based on the
specification of what candidates have to be able to do in the language in order to
be considered proficient. This raises the question of what we mean by the word
proficient.
In the case of some proficiency test, proficient means having sufficient
command of the language for a particular purpose. An example of this would be a
test designed to discover whether someone can function successfully as a United
Nations translator. Another example would be a test used to determine whether a
student English is good enough to follow a course of study at a British University.
Such a test may even attempt to take into account the level and the kind of English
needed to follow courses in particular subject area. It might, for example, have
one form of the test for art subject, another for science, and so on. Whatever the
particular purpose to which the language is to be put, this will be reflected in the
specification of test content at an early stage of a test’s development.10
The aim of a proficiency test is to assess the student’s ability to apply in
actual situations what he has learnt. It seeks to answer the question: ‘having learnt
this much, what can the student do with it? ‘This type of test is not usually related
to any particular course because it is concerned with the student’s current standing
in relation to his future needs. In this view of this future orientation, a proficiency
needs of any student will be to some extend specific, even if his intention is no
more than to use the language as a tourist.11
2) Achievement Tests
In contrast to proficiency tests, achievement test are directly related to
language courses, their purpose being to established individual students, groups of
students, or the courses themselves have been in achieving objectives.12 This
achievement test that also called an attainment or summative tests looks back over
a longer period of learning than the diagnostic test, for example a year’s work, or
a whole course, or even a variety of different courses. It is intended to show the
standard which the standard have now reached in relation to other students at the
same stage.13
They are of two kinds: final achievement tests and progress achievement
tests. Final achievement test are those administered at the end of a course of study.
They may be written and administered by ministries of education, official
examining boards, or by members of teaching institutions. Clearly the content of
these tests must be related to the courses with which they are concerned, but the
nature of this relationship is a matter of disagreement amongst language tester.
Progress achievement tests, are intended to measure the progress that student are
making.14
3) Diagnostic tests
Diagnostic tests are used to identify learners’ strengths and weakness.
They are intended primarily to ascertain what learning still needs to take place. At
the level of broad language skills this is reasonably.
11
The results of evaluation are intended to find the appropriate way to
improve learning and instruction. If pupil fails in a particular subject, a diagnosis
is needed.
“A diagnostic test is design to a particular aspect of a language. A
diagnostic test in pronunciation might have the purpose of determining which
phonological features of English are difficult for a learner.”15 Thus, diagnostic test
is much comprehensive and detail because it searches for the underlying causes of
learning difficulties and then formulate a plan for remedial action.
4) Aptitude Tests
“A language aptitude test is designed to measure a person’s capacity or
general ability to learn a foreign language and to be successful in that
undertaking.”16 Aptitude tests are often used to measure the suitability of a
candidate for a specific program of instruction or a particular kind of employment.
For this reason these tests are often synonymously with intelligence tests or
screening tests.17 Thus, these tests are given before the students begin to study and
to select them in section appropriate to their ability.
b. Types of Tests Item
Based on the manner of scoring, the type of tests item is divided into two
general types: Subjective and Objective tests.
1) Subjective Test
Subjective test is a test where in its scoring requires judgment and
valuation of the scorer. Hughes stated that: “if no judgment is required on the part
15
H. Douglass Brown, Teaching by Principles, An Interactive Approach Pedagogy,
(New York: Addison Wesley Longman, 2001), p. 390.
16
H. Douglass Brown, Teaching by …, p. 391.
17
of the scorer, the scoring is objective... if judgment is called for, the scoring is said
to be subjective.”18
In this type of test, the answer is usually in a form of composition where
the students given a freedom to relate their idea in their onwards. The subjective
tests that are usually used in classroom are essay, short answer and completion.
a) Essay
“The essay item is the most complex of supply type item. It demands that
the student compose a response, often extensive to a question for which no single
response or pattern of response can be cited as correct to the exclusion of all the
answer.”19 Thus, the distinctive feature of essay question is freedom of response it
provides. In answering the question, the students are given freedom to select,
relate, and present ideas in their own words. Because of the feature, the essay test
usually scored differently by the same person on different occasion.
b) Short Answer Question
The short answer item is a short essay item..., and are best suited for
questions requirin a brief response –a word, a phrase, or a sentence. While short
answer items are typically used for knowledge objectives, and essay items are
most appropriate for systhesis and evaluation outcomes, short answer items can
easily be used for higher-order outcomes.20 Thus, when the teachers are going to
know the broader description about something, they are better to use the essay
form.
c) Completion
The completion item is a written statement that requires the examinee to
supply the correct word or short phrase in response to an incomplete sentence, a
question, or a word association. Completion test can be used effectively to
measure the recall of term, dates and names.
This type of test can be used at almost all levels. But it is extremely
difficult to phrase the question or incomplete statement so that only one answer is
correct. And in making the question, it may not too many clues are given, the
items will be too easy, and if an insufficient number of clues are presented, the
item will be ambiguous and may yield several possibility of correct answer.21
2) Objective test
“Objective tests are frequently criticized on the grounds that they are
simpler to answer than subjective tests. Items in an objective test, however, can be
made just an easy or as difficult as the test constructor wisher.”22 While Gay said,
“Objective tests are sometimes criticized on the basis that they are appropriate for
measuring knowledge-level outcome only.”23 Therefore, whether one teacher or
another scores the item, today of last week, it will yield the same score.
Based on the description above, an objective test is a test that has right or
wrong answers and so can be marked objectively. It can be compared with a
subjective test, which is evaluated by giving an opinion, usually based on agreed
criteria. Objective tests are popular because they are easy to prepare and take,
quick to mark, and provide a quantifiable and concrete result.
The objective test items commonly used in classroom testing are true false,
matching, and multiple choices.
21
Wilmar Tinambunan, Evaluation of…, p. 61.
22
J.B. Heaton, Writing English Language Tests, New Edition (London and New York: Longman,1988), p. 26.
23
a) True False
True false item common used in measuring the ability to identify the
correctnes of statement of fact, defrinition of term, statements of principle and the
like..24
True false item doesn’t directly test writing or speaking abilities: only
listening and reading. It may be used to test aspects of language such as
vocabulary, grammar, content of reading or listening passage, it is fairly easy to
design, it is also easy to administer, whether orally or in writing, and to mark.25
Thus, the item provides the students with a choice of two alternatives, so
the students have a possibility to guess the answer and sometimes it will be the
right answer and sometime it will be wrong answer. Because of the random
guessing to produce the correct answer. This type of test usually construct by
statement that the students have to choose whether it is true of false statement. If
the statement is true, the students should write it with ‘T’, and if it is false, they
must be write it with ‘F’.
b) Matching
The matching exercise consist of two parallel column of phrase, words,
numbers, or symbols that mus be matched. Example of items included in
matching execises are person and achievement, dates and historical events. The
nature of mathing exercise limits it to measuring the ability to identify the
Norman E Grondlund and Robert L. Linn, Measurement and Assessment in Teaching,
(New Jersey: Prentice Hall, Inc, 1995), p. 150
25
Penny Ur, A Course …, p. 39.
26
c) Multiple Choice
A multiple choice item consists of one or more introductory sentences
followed by a list of two or more suggested responses from which the examinee
chooses one as the correct answer.27
The multiple choices item can measure a variety of learning outcomes
from simple to complex, and it is adaptable to most types of subject matter
content. The learning outcomes in the knowledge area that can be measured by the
multiple choice items are:28
i. Knowledge of terminology
For this purpose, pupils are requested to show their knowledge
of a particular term by selecting a word that has the same
meaning as the given term or by choosing a definition of the
term. Special uses of term can also be measured by having
pupils identify the meaning of the term when used in context.
ii. Knowledge of specific facts.
It is important in its own right, and it provides a necessary basis
for developing understanding, thinking skills, and other
complex learning outcomes. Multiple choice items designed to
measure specific facts can take many different forms, but
more difficult, this is because principles are more complex than
isolated facts.
iv. Knowledge of method and procedure
27
Anthony J. Nitko, Educational test and Measurement, An Introduction (New York: Harcourt Braco Jovanovich Inc, 1983) , p. 190.
28
The multiple choice form is also able to measure the
knowledge of method and procedure, such as knowledge of
laboratory procedure, knowledge of methods used in problem
solving, computational and performance skill.
Some advantages of using multiple choice items are: the multiple choice
items are fast, easy and economical to score, they can be objectively so that they
will be fairer and more reliable than subjectively scored tests.
Besides those advantages, the multiple choice it’s also have disadvantages
such as: the technique of the test only recognition knowledge, so the students have
no or little opportunity to express their own idea of a problem, pupils have much
time to guess the answer and it may effect on their scores, it is difficult to write
successful items and cheating may be facilitated.29
This type of test has advantages and disadvantages. The advantages of this
test are related to the teacher measure the student. It helps them to give the score
objectively. But in other side it also has disadvantages that related student, that
only measure the students’ knowledge and make the students easy to cheat each
other.
B. Item Analysis
Selection of appropriate language items is not enough by itself to ensure a
good test. Each question need to function properly: otherwise, it can weaken the
exam. Fortunately, there are some rather simple statistical ways of checking
individual items. This is done by studying the students’ responses to each item.
According to Nitko: “item analysis refers to the process of collecting,
summarizing and using information about individuals test items, especially
information about pupils’ responses to items.30 The analysis of students’ response
29
Kathleen M. Bailey, Learning About Language Assessment (Boston: ITP An International Thomson Publishing Company, 1988), p. 131.
30
of objective test items is a powerful tool for test improvement. Ahmann and Glock
said “Item analysis is reexamining each test to discover its strength and flaw.”31
From those opinions, it can be conclude that item analysis is the process of
collecting information about students’ responses to the items to see the quality of
the test items. More specific, item analysis information can tell us if an item was
too easy or too hard. Item analysis data also aid in detecting specific technical
flaws and thus further provides information for improving test items.
According to James Dean Brown and Thom Hudson, “Item analysis is the
systematic statistical evaluation of the effectiveness of individual test item.”32
Item analysis as a whole will be defined here as the systematically
statistical evaluation of the effectiveness of individual test items. Items analysis is
usually done for purposes of selecting which items will remain on future revised
and improved versions of test. Sometime, however, item analysis is performed
simply to investigate how well the items on a test are working with a particular
group of students, or to study which items match the language domain of interest.
C. Kinds of Item Analysis
There are three characteristics usually considered in the test and
measurement, they are:
1. Difficulty Level
Level of difficulty can be identified by selecting the test with percentage
of the correct answer. According to Harrison: “Level of difficulty means the
percentage of students who give the right answer.”33
“A good test should have certain degree of difficulty. It may not too easy
or too difficult, because the test that is too easy or too difficult for the group tested
yield score distribution that makes it hard to identify reliable differences in
achievement levels between members of the group.”34 The level of difficulty is a
31
Ahmann, J. Stanley, and D. Glock, Marving, Evaluating Student…, p. 184.
32
James Dean Brown and Thom Hudson, Criterion-Referenced Language testing, (New York: Cambridge University Press, 2002), p. 113.
33
Andrew Harrison, A Language …, p. 128.
34
percentage of students who answer correctly of the item test. And a good test must
be having an appropriate degree of difficulty. So that by analyzing the students’
response to the items, the level of difficulty of each item can be known and the
information will be helpful for the teacher in identifying concepts to rethought the
study material and giving the student feedback about their learning.
Item difficulty goes by many other names; item facility, item easiness,
p-value, or abbreviated simply as IF.35 To make easier in computing the level of
difficulty, the writer divides the students into three groups. They are upper,
middle, and lower groups. Upper and lower group are be focused in analysis and
the middle group is aside.
The formula for computing item difficulty is as follows:
Where:
FV : Facility value or item difficulty that we are looking for
U : Sum of students from the upper group who answer
correctly
L : Sum of students from the lower group who answer
correctly
2n : Total sum of students in upper and lower group.36
Based on the techniques above, to find out the difficulty level of all the
items in the test by following formula:
Where:
35
James Dean Brown and Thom Hudson, Criterion-Referenced…, p. 114.
P : Difficulty level of all items.
b : Difficulty level of each item
∑ ; Sigma (Total)
N : Total number of test items.37
Score “FV” (Facility value or item difficulty that we are looking for) and
“P” (difficulty level of all items) can be ranged from 0.00 to 1.00. If “FV” or “P”
is less than 0.30, it means almost the student from upper and lower groups cannot
answer the item test correctly (these items belong to difficult one). If “FV” or “P”
is 0.30 - 0.70, it means the proportion of students answering correctly is about
halfway between a chance value and the point where no student misses the item
(these items belong to moderate one). And if “FV” or “P” is more than 0.70, it
means almost the students from upper and lower group can answer the item test
correctly (these items belong to very easy one).
The level of difficulty shows the easiness or difficultness of item test for
that group. So the level of difficulty is influenced by the students’ competence. It
will be different if the test is given to another group.
2. Discriminating Power
The discriminating power of a test item is its ability to differentiate
between pupils who have achieved well (the upper group) and those who have
achieved poorly (the lower group).38 Students with high scores on the test (the
upper group) answered the item correctly more frequently than student with low
scores on the test (the lower group). If the test items given to the students who
have studied well, the score will be high and if they are given to those who have
not, the score will be low. On the contrary, if the test items yield the same score
37
Asmawi zainul dan Noehi Nasution, Penilaian Hasil Belajar, (Jakarta: PAU-PPAI, UT, 1993), p. 153.
38
when they are given to the two groups, or even to the upper group yield the low
score and to the lower group yield the high score, so they are not good test items.
Effective and ineffective distracters can be identified from analysis, and
those which are not working as planned can be rewritten or replaced. A change in
alternatives for a multiple choice item can increase discrimination.
The formula is as follows:
Where:
DP : the index of item discriminating power.
U : the number of students in the upper group who answered
the item correctly
L : the number of students in the lower group who answered
the item correctly
T : total number of students in upper and lower group.39
Item discrimination statistic is calculated by subtracting the number of
students in the upper group who answered the item correctly from the number of
students in the lower group who answered the item correctly then it is divide by
half of total number of students in upper and lower group.
3. Distracter Effectiveness
A good distracter will attract more students who have not studied well (the
lower group) than the upper group. On the contrary, a weak distracter will not be
selected by any of the lower achieving students.
39
Ngalim Purwanto, Prinsip-prinsip dan Teknik Evaluasi Pengajaran, (Bandung: Remaja Rosdakarya, 1986), p.120.
One important aspect affecting the difficulty of multiple choice test items
is the quality of distracters. Some distracters, in fact, might not be distracting at
all, and therefore serve no purpose.40 Because the parts of multiple choice items
include the item stem, or the main part of the item at the top, the options, which
are the alternative choices presented to the student, the correct answer, which is
the option that will be counted as correct, and the distracters, which are the
options that will be counted as incorrect.41
In a good test item, the distracters must be functioned effectively, if the
distracter are not functioned, they should be rewritten or discarded. Distracter
analysis is done by comparing the number of students in the upper group and the
lower group who select each incorrect alternative.
D. The Importance of Item Analysis
The result of item analysis can be used to select items of desired difficulty
that best discriminate between high and low achieving students. However the
results of an item analysis can be useful in identifying faulty items and can
provide information about student misconception and topics that need additional
work.42
The benefits of item analysis are not limited to the improvement of
individual test items; however there are a number of fringe benefits of special
value to classroom teachers. The most important of these are the following:
1. Item analysis data provide a basis for efficient class discussion of
the test result.
2. Item analysis data provide a basis for remedial work.
3. Item analysis data provide a basis for the general improvement of
classroom instruction.
40
Nana Sujana, Penilaian Hasil Proses Belajar Mengajar, (Bandung: Remaja Rosdakarya, 2001), p. 141.
41
James Dean Brown, Testing in Language Programs, (New Jersey: Prentice Hall Regents, 1996), p.70.
42
4. Item analysis procedures provide a basis for increased skill in test
construction.43
Based on the statement above there are so much benefit that teacher could find in
doing analyze in items. All the benefits are related to the achievement in students’
score.
While Nitko states in his book, the important of item analysis are:
Determining whether an item functions as the teacher intends, feedback to
students about their performance and as basis for class discussion, feedback to the
teacher about pupil difficulties, areas for curriculum improvement, revising the
items, improving item writing skills.44
So that, item analysis should be done by the teacher, because with doing analysis,
that facilitate teachers in measuring students, motivate students to studied well,
active in every class performance and make teacher easy to measure the item
question that was good or not.
43
Norman E Grondlund and Robert L. Linn, Measurement and…, p. 316.
44
25
A. The Objective of The Study
The objective of this study is to measure the quality of difficulty level of
English Summative test items at odd semester of the second grade students at
SMP YMJ Ciputat, specifically in difficulty level of each item. This research is
regarded as a field for the writer wish to widen her or his knowledge both
theoretically and practically about testing, specifically about difficulty level of
the test items.
B. Place and Time of The Study
The research was conducted at SMP Yayasan Miftahul Jannah (YMJ)
Ciputat. This is located at Jl. Limun No. 27 Ciputat Tangerang Selatan. The
writer did the research from January 5th up to February 10, 2010. The writer took
the English Summative test question paper and the students’ answer sheet at odd
semester of the second grade students at SMP YMJ Ciputat.
C. Research Method
In this research the writer used quantitative analysis technique. The data
are calculated using simple percentage formula. It is used to find out difficulty
level of each item of English Summative test at odd semester of the second grade
students at SMP YMJ Ciputat, Academic year 2009-2010.
D. Research Instrument
1) The students’ answer sheet
The students’ answer sheet is papers in which students give their answer
2) English summative test question paper at odd semester of the second
grade students at SMP YMJ Ciputat, 2009-2010 academic year, which
was conducted on Wednesday, December 9th, 2009. Started at 7.30 up
to 9.30 a.m.
E. The Techniques of Data Analysis
In this research the writer used quantitative method. To analyze the level
of difficulty of each item in the English summative test, at odd semester of the
second grade students at SMP YMJ Ciputat.
To count the difficulty level, the writer uses formula from J.B. Heaton as
follows:
FV = Correct U + Correct L
N
Where:
FV : Facility value or item difficulty that we are looking for
U : Sum of students from the upper group who answer correctly
L : Sum of students from the lower group who answer correctly
N : Total sum of students in upper and lower group.1
Based on the techniques above, the writer tries to find out the difficulty
level of all the items in the English Summative at odd semester tested at second
year students of SPM YMJ Ciputat, 2009-2010 academic year, by following
formula:
Where:
P : Difficulty level of all items.
B : Difficulty level of each item
1
J.B. Heaton, Writing English…, p.182
P = ∑ b
∑ : Sigma (Total)
N : Total number of test items.2
Score “FV” (Facility value or item difficulty that we are looking for) and
“P” (difficulty level of all items) can be ranged from 0.00 to 1.00. If “FV” or “P”
is less than 0.30, it means almost the student from upper and lower groups cannot
answer the item test correctly (these items belong to difficult one). If “FV” or “P”
is 0.30 - 0.70, it means the proportion of students answering correctly is about
halfway between a chance value and the point where no student misses the item
(these items belong to moderate one). And if “FV” or “P” is more than 0.70, it
means almost the students from upper and lower group can answer the item test
correctly (these items belong to very easy one).
To make clear, the writer will give the table of difficulty level range as
follow:3
Table 3.1
Level of Difficulty
P Interpretation
≤ 0.30 Difficult
0.30 – 0.70 Moderate
≥ 0.70 Easy
The level of difficulty shows the easiness or difficultness of item test for
that group. So the level of difficulty is influenced by the students’ competence. It
will be different if the test is given to another group.
2
Asmawi Zainul dan Noehi Nasution, Penilaian Hasil Belajar, (Jakarta: PAU-PPAI, UT, 1993), p. 153
3
28
A. Description of Data
The data that the writer used in her research is the English Summative test
items tested at odd semester of the second grade students of SMP YMJ Ciputat.
The time has given 120 minutes. The total numbers of test items are 50 items.
There are 121 students who were classified into three groups. 40 students
are in the upper group (33 %), 41 students are in the middle group (33. 8 %) and
40 students are in the lower group (33 %). But not all the students’ answer sheet
will be analyzed. The writer took only two groups of them. They are 33% from
the upper groups and 33% from the lower group. And the middle group is been
aside. (See appendix table 1 page 51).
B. Analysis of Data
Number 1 is an easy item, because there are 38 students in the upper
group and 31 students in the lower group who can answer correctly, so there are
69 students from 80 students. The difficulty level of this item is 0. 86, it means
that it is range 0. 71 to 1. 00 that belongs to easy item. This question asks students
about ‘who invites to have a dinner party’. It can be understood that this question
tries to measure the students’ ability in understanding the factual information from
the invitation text. Thus, this item conforms to the recomended indicator. Namely
“Mengidentifikasi berbagai informasi dalam teks fungsional pendek berbentuk
Alternatives revision is changing part of the question, the word invites become
organizes
Number 2 is a good item, because there are 27 students in the upper group
and 16 students in the lower group who can answer correctly, so there are 43
students from 80 students. The difficulty level of this item is 0. 54, it means that it
is range 0. 30 to 0. 70 that belongs to moderate item. This question asks students
about ‘the dinner party will be held’. It can be understood that this question tries
to measure the students’ ability in understanding the factual information from the
invitation text. Thus, this item conforms to the recomended indicator. Namely
“Mengidentifikasi berbagai informasi dalam teks fungsional pendek berbentuk
undangan.”
Number 3 is a good item, because there are 35 students in the upper group
and 11 students in the lower group who can answer correctly, so there are 46
students from 80 students. The difficulty level of this item is 0. 58, it means that it
is range 0. 30 to 0. 70 that belongs to moderate item. This question asks students
about ‘who will come to the dinner party’. It can be understood that this question
tries to measure the students’ ability in understanding the factual information from
the invitation text. Thus, this item conforms to the recomended indicator. Namely
“Mengidentifikasi berbagai informasi dalam teks fungsional pendek berbentuk
undangan.”
Number 4 is a difficult item, because there are only 8 students in the
upper group and 10 students in the lower group who can answer correctly, so there
are only 18 students from 80 students. The difficulty level of this item is 0. 23, it
means that it is range 0. 00 to 0. 29 that belongs to difficult item. This question
asks students about ‘funniest means’. This question is imtended to measure
students’ ability to identify the synonym word. Thus, this item conforms to the
recomended indicator. Namely, “mengidentifikasi makna kata, frase dan
The alternatives revision is changing the alternatives answer. That are: a.
the beautifulest, b. the most beautiful, c. the more beautiful, d. the beautifuler
Number 5 is a good item, because there are 13 students in the upper group
and 17 students in the lower group who can answer correctly, so there are 30
students from 80 students. The difficulty level of this item is 0. 3, it means that it
is range 0. 30 to 0. 70 that belongs to moderate item. This question asks students
about ‘which one of the following statement is true based on the text’. It can be
understood that this question is about “stated detail question” from the invitation
text. Thus, this item conforms to the recomended indicator.
Number 6 is a good item, because there are 35 students in the upper group
and 19 students in the lower group who can answer correctly, so there are 54
students from 80 students. The difficulty level of this item is 0. 68, it means that it
is range 0. 30 to 0. 70 that belongs to moderate item. This question asks students
about ‘who is the champion of the English Poetry Reading Comprehension’. It
can be understood that this question tries to measure the students’ ability in
understanding the factual information from the recount text. Thus, this item
conforms to the recomended indicator. Namely “Makna tekstual dalam teks
recount”
Number 7 is a good item, because there are 28 students in the upper group
and 16 students in the lower group who can answer correctly, so there are 44
students from 80 students. The difficulty level of this item is 0. 55, it means that it
is range 0. 30 to 0. 70 that belongs to moderate item. This question asks students
about ‘whom did the writer meet’. It can be understood that this question tries to
recount text. Thus, this item conforms to the recomended indicator. Namely
“Makna tekstual dalam teks recount”.
Number 8 is a good item, because there are 31 students in the upper group
and 14 students in the lower group who can answer correctly, so there are 45
students from 80 students. The difficulty level of this item is 0. 56, it means that it
is range 0. 30 to 0. 70 that belongs to moderate item. This question asks students
about ‘Where was the writer born’. It can be understood that this question tries to
measure the students’ ability in understanding the faktual information from the
recount text. Thus, this item conforms to the recomended indicator. Namely
“Makna tekstual dalam teks recount”.
Number 9 is a good item, because there are 31 students in the upper group
and 21 students in the lower group who can answer correctly, so there are 52
students from 80 students. The difficulty level of this item is 0. 65, it means that it
is range 0. 30 to 0. 70 that belongs to moderate item. This question asks students
about ‘“I hope I meet them again someday.” The underlined word refers to’. This
question is imtended to measure students’ ability to identify the pronoun referent.
Thus, this item conforms to the recomended indicator, namely, “mengidentifikasi
makna kata, frase dan kalimat”.
Number 10 is a good item, because there are 28 students in the upper
group and 5 students in the lower group who can answer correctly, so there are 33
students from 80 students. The difficulty level of this item is 0. 41, it means that it
is range 0. 30 to 0. 70 that belongs to moderate item. This question asks students
about ‘Which one of the statement is TRUE based on the text’. It can be
understood that this question is about “stated detail question” from the recount
text. Thus, this item conforms to the recomended indicator.
Number 11 is a good item, because there are 32 students in the upper
group and 12 students in the lower group who can answer correctly, so there are
that it is range 0. 30 to 0. 70 that belongs to moderate item. This question asks
students about ‘Arrange the jumbled words bellow into the correct order’. Thus,
this item conforms to the recomended indicator. Namely “menyusun kata menjadi
teks fungsional yang bermakna”
Number 12 is a good item, because there are 26 students in the upper
group and 13 students in the lower group who can answer correctly, so there are
39 students from 80 students. The difficulty level of this item is 0. 49, it means
that it is range 0. 30 to 0. 70 that belongs to moderate item. This question asks
students about ‘Arrange the sentences into a good paragraph’. Thus, this item
conforms to the recomended indicator. Namely “menyusun kalimat menjadi teks
fungsional yang bermakna dalam bentuk descriptive”
Number 13 is a good item, because there are 32 students in the upper
group and 11 students in the lower group who can answer correctly, so there are
32 students from 80 students. The difficulty level of this item is 0. 54, it means
that it is range 0. 30 to 0. 70 that belongs to moderate item. This question asks
students about ‘Look at the picture. What short of hair does he have’. It can be
understood that this question is measure the students’ vocabulary. Thus, this item
conforms to the “Health theme”.
Number 14 is a good item, because there are 32 students in the upper
group and 18 students in the lower group who can answer correctly, so there are
50 students from 80 students. The difficulty level of this item is 0. 63, it means
that it is range 0. 30 to 0. 70 that belongs to moderate item. This question asks
students about ‘The boy in the picture of no. 13 is holding’. It can be understood
that this question is measure the students’ vocabulary. Thus, this item conforms to
the “Health theme”.
Number 15 is a good item, because there are 27 students in the upper
group and 9 students in the lower group who can answer correctly, so there are 36
is range 0. 30 to 0. 70 that belongs to moderate item. This question asks students
about ‘The girl has got a special birthday present from her parents. She is’. It can
be understood that this question is measure the students’ vocabulary. Thus, this
question does not conform to the curriculum.
Number 16 is a good item, because there are 24 students in the upper
group and 16 students in the lower group who can answer correctly, so there are
40 students from 80 students. The difficulty level of this item is 0. 50, it means
that it is range 0. 30 to 0. 70 that belongs to moderate item. This question asks
students about ‘Who is the leader of the Student Organization’. It can be
understood that this question tries to measure the students’ ability in
understanding the factual information from the invitation text. Thus, this item
conforms to the recomended indicator. Namely, “Mengidentifikasi berbagai
informasi dalam teks fungsional pendek berbentuk undangan.”
Number 17 is a good item, because there are 25 students in the upper
group and 18 students in the lower group who can answer correctly, so there are
43 students from 80 students. The difficulty level of this item is 0. 54, it means
that it is range 0. 30 to 0. 70 that belongs to moderate item. This question asks
students about ‘When will Niken come to the meeting’ It can be understood that
this question tries to measure the students’ ability in understanding the faktual
information from the invitation text. Thus, this item conforms to the recomended
indicator. Namely “Mengidentifikasi berbagai informasi dalam teks fungsional
pendek berbentuk undangan.”
Number 18 is a good item, because there are 26 students in the upper
group and 8 students in the lower group who can answer correctly, so there are 34
students from 80 students. The difficulty level of this item is 0. 43, it means that it
is range 0. 30 to 0. 70 that belongs to moderate item. This question asks students
about ‘“The agenda of the meeting is final preparation ….” The synonym of the
the synonym word. Thus, this item conforms to the recomended indicator,
namely, “mengidentifikasi makna kata, frase dan kalimat”.
Number 19 is a good item, because there are 32 students in the upper
group and 12 students in the lower group who can answer correctly, so there are
44 students from 80 students. The difficulty level of this item is 0. 55, it means
that it is range 0. 30 to 0. 70 that belongs to moderate item. This question asks
students about ‘The following statements are true based on the text above,
EXCEPT’. It can be understood that this question is about “unstated detail
question” from the invitation text. Thus, this item conforms to the recomended
indicator.
Number 20 is a good item, because there are 32 students in the upper
group and 11 students in the lower group who can answer correctly, so there are
43 students from 80 students. The difficulty level of this item is 0. 54, it means
that it is range 0. 30 to 0. 70 that belongs to moderate item. This question asks
students about ‘My mother and I … the cake for my birthday tomorrow. We have
bought some eggs, some sugar, and some flour’. . It can be understood that this
question is measure the students’ comprehention in grammar. But this question
does not conform to the curriculum.
Number 21 is a difficult item, because there are only 8 students in the
upper group and 9 students in the lower group who can answer correctly, so there
are only 17 students from 80 students. The difficulty level of this item is 0. 21, it
means that it is range 0. 00 to 0. 39 that belongs to difficult item. This question
asks students about ‘Read the sentence “then he left me a book’. What does the
word in italic type refer to’. This question is imtended to measure students’ ability
to identify the pronoun referent. Thus, this item conforms to the recomended