• Tidak ada hasil yang ditemukan

Content analysis and authenticity of the 2012 english test in the senior high school national examination


Academic year: 2017

Membagikan "Content analysis and authenticity of the 2012 english test in the senior high school national examination"


Teks penuh





Presented as Partial Fulfillment of the Requirements to Obtain the Sarjana Pendidikan Degree

in English Language Education


Frisca Ayu Desi Widyaningrum Student Number: 091214136









Presented as Partial Fulfillment of the Requirements to Obtain the Sarjana Pendidikan Degree

in English Language Education


Frisca Ayu Desi Widyaningrum Student Number: 091214136





“Only those who dare to fail greatly

can ever achieve greatly.”

Robert F. Kennedy

“It is part of the job of life to figure out who you

are and what you have got.”

- Happy Feet Two (2011)

“Tell me and I forget.

Teach me and I remember.

Involve me and I learn.”

Benjamin Franklin

I dedicate my thesis to:

Sanata Dharma University as my love to this campus in which I got friends, lessons, happiness,

wisdom and life values

also to my family as my love to them.




I honestly declare that this thesis, which I have written, does not contain the work or parts of the work of other people, except those cited in the quotations and the references, as a scientific paper should.

Yogyakarta, 16 January 2014 The Writer





Yang bertanda tangan di bawah ini, saya mahasiswa Universitas Sanata Dharma Nama : Frisca Ayu Desi Widyaningrum

Nomor Mahasiswa : 091214136

Demi pengembangan ilmu pengetahuan, saya memberikan kepada Perpustakaan Universitas Sanata Dharma karya ilmiah saya yang berjudul:


beserta perangkat yang diperlukan (bila ada). Dengan demikian saya memberikan kepada Perpustakaan Sanata Dharma baik untuk menyimpan, mengalihkan dalam bentuk media lain, mengelolanya dalam bentuk pangkalan data, mendistribusikan secara terbatas, dan mempublikasikan di internet atau media lain untuk kepentingan akademis tanpa perlu meminta ijin dari saya maupun memberikan royalti kepada saya selama tetap mencamtumkan nama saya sebagai penulis.

Demikian pernyataan ini saya buat dengan sebenarnya.

Dibuat di Yogyakarta

Pada tanggal: 16 Januari 2014 Yang menyatakan



Widyaningrum, Frisca Ayu Desi. 2014. Content validity and authenticity of the 2012 English test in the senior high school national examination. Yogyakarta: English Language Education Study Program, Sanata Dharma University.

National Examination (UN) was the most important standardized test employed to assess Indonesian students’ competence, including English ability. Moreover, the final scores of the National Examination were prepared to be a tool to select state-university-student candidates. Due to that significance of National Examination, analyzing its validity and authenticity was important as well. Language test validity was categorized into face, content, construct, consequential and criterion-referenced validity. Due to consideration of time for analysis, only content validity of both listening section and the five reading test versions of National Examination were analyzed in this research. Besides, language test authenticity in this research referred to test tasks and test texts. This study mainly employed Brown’s theory (2004) about content validity and authenticity.

This study aimed to answer two questions, namely: 1) How valid is the content of English test items of National Examination year 2012 for senior high schools related to the lesson objectives and test specifications? and 2) How authentic is English test items of National Examination year 2012 for senior high schools related to the criteria of authenticity set by Brown?

The researcher employed a qualitative research with document analysis. The research objects were the listening items and the five types of reading test items of National Examination year 2012 which aimed to avoid students’ fraudulence. However, the five reading test versions had similarities on either the test tasks or the test texts. The data of the analysis were obtained by using checklists and the data were employed to answer the research questions. Besides, questionnaires were distributed to four experts as the data triangulation.

There were two findings of this research. First, the content of the National Examination year 2012 was 98.8% valid since almost the contents were relevant to the test specifications. There were three reading test versions failed to represent a certain kind of reading texts namely explanation text. Second, the National Examination year 2012 met the criteria of authenticity with percentage 79.5% since some listening and reading test items were not qualified to authenticity criteria. Natural language use, the relevance of the test topics, and real-world representativeness became problematic aspects to meet the higher standard of authenticity. This research was expected to be beneficial as a meaningful evaluation upon the administration of National Examination for senior high schools in Yogyakarta as well as be useful for English practitioners and future researchers.



Widyaningrum, Frisca Ayu Desi. 2014. An Analysis on content validity and authenticity of the 2012 English test in national examination for senior high schools. Yogyakarta: English Language Education Study Program, Sanata Dharma University.

Ujian Nasional (UN) merupakan tes sta ndard terpenting yang diselenggarakan untuk menilai kompetensi para peserta didik di Indonesia, termasuk kemampuan berbahasa Inggris. Terlebih, nilai akhir UN dipersiapkan untuk menyeleksi calon mahasiswa perguruan tinggi negeri. Karena pentingnya UN, menganalisa validitas dan autentisitas UN juga penting. Validitas tes bahasa dikelompokkan menjadi validitas permukaan, validitas isi, validitas konstruksi, validitas sebab-akibat, dan validitas kriteria. Karena keterbatasan waktu analisis, hanya validitas isi dari soal mendengarkan dan kelima versi soal tes membaca yang dapat dianalisa dalam penelitian ini. Selain itu, autentisitas tes bahasa pada penelitian ini mengacu pada test tasks dan test texts. Penelitian ini pada dasarnya menggunakan teori Brown (2004) mengenai validitas isi dan autentisitas.

Penelitian ini bertujuan untuk menjawab dua rumusan masalah, yaitu: 1) Bagaimana naskah soal UN tahun 2012 untuk SMA memenuhi kriteria validitas isi dalam hubungannya dengan tujuan pembelajaran dan kisi-kisi soal? 2) Bagaimana naskah soal UN tahun 2012 untuk SMA memenuhi kriteria autentisitas kaitannya dengan teori Brown?

Peneliti menggunakan jenis penelitian kualitatif dengan analisis dokumen. Objek penelitian ini adalah soal tes mendengarkan dan kelima tipe naskah soal membaca di Ujian Nasional tahun 2012 yang bertujuan menghindari tindak kecurangan siswa. Namun, kelima naskah soal membaca tersebut memiliki banyak persamaan dala m hal test tasks dan test texts. Data analisis diperoleh dengan menggunakan checklists dan data tersebut digunakan untuk menjawab rumusan masalah. Selain itu, peneliti menyebarkan angket pertanyaan kepada empat ahli sebagai triangulasi data.

Ada dua temuan pada penelitian ini. Pertama, naskah soal UN tahun 2012 untuk SMA adalah 98.8% valid karena hampir seluruh butir soal sesuai dengan kisi-kisi soal. Ada tiga macam tipe tes membaca yang tidak merepresentasikan satu jenis teks yaitu explanation text. Kedua, soal UN tahun 2012 untuk SMA memenuhi kriteria autentisitas dengan persentase 79.5% karena bebera pa soal mendengarkan dan membaca tidak sesuai dengan kriteria autentisitas. Kelaziman penggunaan bahasa, kesesuaian teks, dan adanya representasi kehidupan sehari-hari menjadi permasalahan autentisitas. Peneliti berharap penelitian ini berguna sebagai media evaluasi terhadap pelaksanaan UN untuk SMA di Yogyakarta serta berguna bagi praktisi pengajaran Bahasa Inggris dan peneliti lain di masa mendatang.




First of all, I would like to extend my great gratitude to my savior, Jesus Christ, for His endless love and uncountable blessings. He always enlightens my ways every time I am hopeless. Besides, He always helps me to stand strong.

I would like to address my gratitude to my thesis advisor, Carla Sih Prabandari, S.Pd., M.Hum, for her greatest patience, guidance, suggestions and encouragement. Therefore, I can finish my thesis well. Besides that, I would like to render thanks to the lecturers of English Language Education Study Program of Sanata Dharma University especially those who have contributed to this thesis accomplishment: Adesti Komalasari, S.Pd., M.A., Ag. Hardi Prasetyo, S.Pd., M.A., Drs. Barli Bram, M.Ed., Ph.D., Markus Budiraharjo, B.Ed., M.Ed., Ed.D., Sr. Margareth, FCJ, Veronica Triprihatmini, S.Pd., M.Hum., M.A., and my academic advisor: Christina Kristiyani, S.Pd., M.Pd.

I would like to express my gratitude to the English teacher at SMA Negeri 7 Yogyakarta, Dra. Dorothea Sri Ismayawati, for her help, support, and care. Then, I would like to thank PBI secretariat staffs, Mbak Dhanniek and Mbak Linda, who have helped me to manage all of the things related to administration. I also address my thankfulness to Sanata Dharma Library officers for their friendly and warm services.



tough woman. I would also give my special gratitude to my siblings: Anindya Marthasari, Bagus Rilo Pambudhi, and Annisa Dela Widhiastuti for their cheerfulness and support.

I would personally give thanks to my best friends: Efrem Justitia Suksma, Yut Liyut, Vicky, and Agustina for their support; Helen, Bruder Markus, Sekar, Ita, Niken, Pipiet, Bertha, and Denny for their assistance and encouragement. My thankfulness also goes to my play fellows: Unggul, Erik, Momon, Dovi, Yudha, Veri, and Yulius Nico. I also give my gratitude to all my friends in English Language Education Study Program of Sanata Dharma University for their cheerfulness, encouragement, and help, especially for my play performance partners in A Fortiori, for my classmates in class F, my classmates in

TW class and my SPD group mates (Niko, Linda, Ika, Budi, Vita, and Tunggul). Last but not least, I sincerely address my sincere gratitude for those who I could not mention one by one for their help and support. May God always bless them all with endless happiness!











ABSTRAK... viii





CHAPTER I. INTRODUCTION F.Definition of Terms...





3.Test Purposes... a.Language Aptitude Tests... b.Proficiency Tests... D. Data Gathering Techniques... E. Data Analysis Techniques... F.Research Procedures...


A. Content Validity of 2012 English Test Items in National Examination for Senior High Schools... 1.Validity of the Test Specifications... 2.Content Validity of the Listening Test Items according to Competence Standard and Basic Competence... 3.Content Validity of the Listening Test Items according to Graduate Competence Standard...

47 48




4.Content Validity of the Reading Test Items according to Competence Standard and Basic Competence... 5.Content Validity of the Reading Test Items according to Graduate Competence Standard... B.Authenticity of 2012 English Test Items in National Examination for

Senior High Schools... 1.Authenticity of the Listening Test Items... 2.Authenticity of the Reading Test Items... a. Authenticity of the Test Tasks... b.Authenticity of the Test Texts... C.Other Findings...



63 67 70 71 75 77


A.Conclusions... B.Recommendations...

79 82




Table Page

Table 3.1 The Sample of Test Specification Validation Checklist... 35 Table 3.2 The Sample of Authenticity Checklist... 39 Table 3.3 The Sample of Authenticity of the Test Text Checklist... 41 Table 4.1 The Percentages of Content Validity and Authenticity of

the Test Items... 48 Table 4.2 The Dissemination of the Reading Test Items according to

Competence Standard and Basic Competence... 57 Table 4.3 The Dissemination of the Reading Test Items according to

Graduate Competence Standard... 63 Table 4.4 The Dissemination of the Similar Kinds of the Reading

Test Items... 66 Table 4.5 The Percentages of the Authenticity of the Test




Appendix Page

Appendix 1 Transcription of the Listening Section... 88 Appendix 2 The Document of Competence Standard and Basic



This chapter describes the background of the study, research problem, problem limitation, research objectives, and research benefits. Besides, in this chapter explains the definition of the terms used in this research. Each part is described as follows.

A. Research Background

Viewed from its functions, National Examination in Indonesia is the highest standardized test employed to assess and measure Indonesian

students’ competence. By passing National Examination, Indonesian

students are able to graduate from a certain education level and to continue their study to the further education level. This is stated in Education Ministry Regulation (No. 59/2011) on National Examination, “National Examination, abbreviated as UN (Ujian Nasional), is a national standardized test which is administered nationally in order to test the

students’ competency achievement on a particular subject in a group of


Mentioned in Education Ministry Regulation (No. 22/2006), National Examination materials are generally based on Competence Standard and Basic Competence of each level of educational units concluded in Content Standards

(Standard Isi). Competence Standard and Basic Competence comprise lesson materials from the first level up to the last level of classes. Furthermore, Competence Standard and Basic Competence of each level of educational units become reference to createGraduate Competence Standard (Standard Kompetensi Lulusan) which consists of test specifications. The policy regulates the subjects which are examined in National Examination. In senior high schools, for example, the subjects which are examined depend on students’ majors. However, some subjects such as Indonesian and English are examined in National Examination to all majors such as natural science, social science, and linguistics.

The test-makers (Department of Education) need to pay attention to the test’s content validity and authenticity in order to make good test items, particularly in National Examination. Content validity is one of the validity facets and it is important since it helps the test reflect the measured skills which should be performed by students. American Psychological Association (1985) advances validity of a test to reveal the meaningful, appropriate, and useful test scores (as cited in Rudner and Shafer, 2002, p.12).


content validity, there is no possible logical outcome that the test-examiners are not able to determine that students achieve the set of learning objectives in a particular level of education (as cited in Jandhagi and Shaterian, 2008, p.2).

Later, the National Examination test-makers need to pay attention to the authenticity of the test as well as the content validity. Bachman and Palmer (1996) define authenticity as “the degree of correspondence of the characteristics of a given language test task to the features of a target

language task” (p.23). In its relation to National Examination, authenticity

enrolls two important parts namely test task characteristics and test text characteristics like in a reading test. The test tasks pointed up the test instructions and the optional answers of the items while the test texts referred to the passages used in the test.


authenticity. Moreover, it is important that the materials used in the test are

relevant to students’ majors. If the materials used in the language test cannot

resemble the relevance to the majors, it would be difficult for students to understand the content. As the result, the texts’ content is not communicated well to the students.


The researcher intends to analyze the content validity and authenticity of English items of National Examination year 2012 for senior high schools since National Examination is the highest standardized test employed to assess Indonesian students’ competence. Moreover, students’ scores of National Examination are prepared to be possibly used as a selection tool. Due to the reasons, validity and the authenticity of the National Examination are essential to be considered.

B. Research Problems

The problems of this research are formulated as follows:

1. How valid is the content of English test items of National Examination year 2012 for senior high schools related to the lesson objectives and the test specifications?

2. How authentic is English test items of National Examination year 2012 for senior high schools related to the criteria of authenticity set by Brown?

C. Problem Limitation


senior high schools consist of two types namely listening section and reading section. It is since both sections are necessary in National Examination according to the Competence Standard and Basic Competence. However, the researcher focuses the analysis on both listening and reading section of English test items of National Examination year 2012 for senior high schools.

Due to the various test types of National Examination in Indonesia and incapability to organize research in English test items of National Examination year 2012 for senior high school conducted in Indonesia, the analysis was focused on the test items which was conducted Daerah Istimewa Yogyakarta. The researcher took sets of English items of National Examination for natural science and social students, which was held in SMA Bopkri 2 Yogyakarta. This school was chosen because it becomes a representative of the English test items which is administered in Yogyakarta.


According to Brown (2004), validity of a test, particularly a language test like an English test, is determined by some aspects namely face validity, content-related evidence (content validity), criterion-related evidence, consequential validity, and construct validity. However, the research is focused on the content-related evidence (content validity) in order to analyze the validity of English items of National Examination year 2012. The authenticity of test items is determined by the authenticity indicators claimed by Brown (2004). The indicators are the natural language which is used in the test, the contextualization of the items, the relevance of the test topics and the learners, the presence of thematic test item organization, and the representation of the real-world task or sources (p.28).

D. Research Objectives


1. To obtain information whether the test items of listening and reading section of 2012 English National Examination for senior high schools meet content validity.

2. To obtain information whether the test items of listening and reading section 2012 English National Examination for senior high schools meet authenticity.

E. Research Benefits

There are benefits of this study for the English teachers, the test-makers of National Examination, and the future researchers. The benefits imply on the research data and findings revealed in the analysis. They are described in details as follows.

1. For English Teachers and English Practitioners

This study gives an analysis of the content validity and the authenticity of English items of National Examination year 2012 for senior high schools. Since content validity and authenticity takes part in performing a good test and this research reveals information about both kinds of principles of language assessment. Therefore, from this research, the English teachers are able to be more aware of English items of National Examination in the following years.

2. For National Examination Test-designers


National Examination that the test-makers have made in case of its content validity and authenticity. Furthermore, the findings of this research could be significant considerations for the National Examination test-designers in designing the English language tests, particularly in the future National Examination.

3. For Future Researchers

This study provides meaningful data related to the content validity and authenticity of English test items of National Examination year 2012 which is useful for future researchers as references in conducting research in English language tests. Moreover, the future researchers are able to explore this research more and reveal other findings by applying other high technical research instruments to obtain data. Therefore, there would be more detail information related to the content validity and authenticity of English test discovered.

F. Definition of Terms

There are definitions of the terms used in this study which relates to content validity, authenticity, National Examination, and senior high school. The definitions are taken from several experts and certain education regulations. The terms are defined as follows.


Gronlund (1998) says, “validity is the extent to which inferences made from assessment result are appropriate, meaningful, and useful in terms of the

purpose of the assessment” (as cited in Brown, 2004, p.22). It simply means that

in order to make a test valid, appropriate and meaningful, the test should reflect the lesson objectives. Besides reflecting the lesson objectives, the test result should be appropriately connected to the purpose of the test and one of the validity facets is content validity.

Brown (2004) adds if a test has content validity, the items of the test represent the measured subject-matter or behavior in order to evaluate achievement or proficiency tests (p.23). In addition, the test content should reflect the target tasks which are organized in the test specifications and the lesson objectives. In this study, the researcher dealt with the content of English test items of National Examination year 2012 for senior high schools in order to analyze the validity of its content by comparing the test content with the relevant test specifications incorporated in Graduate Competence Standard and the lesson objectives which are incorporated in Competence Standard and Basic Competence.

2. Authenticity

Authenticity is a matter of appropriateness which is referred to the test

items’ content and construction. Authenticity of test items is able to be analyzed


the target language (as cited in Brown, 2004, p.28). It indicates that in order a test task meets authenticity; the test task should simulate a real-world task and the test task’s aim is not to test a grammatical form of language. This is also referred to a statement about authentic texts from William (1984). He states that authentic texts are written to convey a message (as cited in Day, 2003, p.4).

It means that the authentic texts aim mainly for communication and not for teaching grammar or lexis. However, the language used in both test tasks and test texts should be as natural as the target language. The target language used in National Examination is American English and British English. Therefore, the authenticity contains five visible and important indicators as what Brown (2004) claims. The important indicators are the natural language which is used in the test, contextualization of the items, relevance of the test topics and the learners, presence of thematic test item organization, and representation of the real-world task or sources (p.28). 3. National Examination

According to Education Ministry Regulation (No. 59/2011) on National Examination, “National Examination, abbreviated as UN, is a national standardized test administered nationally in order to test students’ competence achievement on a particular subject in a group of science and technology”. National Examination is administered to each educational level starting from elementary school level to senior high school level. The


subjects of study which are examined. National Examination is annually administered to students in the highest grade of an educational level in order to pass a certain level of education and then to continue their study to the higher level of education. In senior high schools, National Examination is administered to each majors; natural science, social science, and linguistics. National Examination is prepared by Department of Education which instructs groups of teachers (MGMP) to make the test items. The test items for each region are different in order to avoid fraudulence which is carried out by the students. Therefore, there are five types of National Examination year in 2012 administered to senior high schools in Yogyakarta and the types will be explained in chapter four.

This study deals with the five versions of English items of National Examination year 2012 for senior high schools which were used in Yogyakarta. The test versions are A57, B69, C71, D32, and E45. The five test versions refers to the reading section since the listening section of the National Examination year 2012 is available only in one version. This study is focused on the test items which are administered to natural science and social science students.

4. Senior High Schools





This chapter presents review of theoretical writings and research related to the study matter. Furthermore, this chapter helps the researcher to answer two research questions. This part contains two major parts of the review of related literature namely theoretical description and theoretical framework.

A. Theoretical Description 1. Language Testing

Testing refers to the activity of testing individuals or things in order to reveal certain information. Besides revealing certain information, testing is

conducted to measuring one’s capability, knowledge or performance in a certain


2. Language Test

Tests refer to an examination of one’s knowledge or ability and they commonly consist of questions to be answered or activities to be presented. In other words, Brown (2004) defines tests as instruments of measuring the ability and knowledge (p.3). In practice, language tests are differenciated into the way they are composed or designed (method) and the purpose of designing the test itself. In term of method, language tests are differentiated into two types namely traditional paper-and-pencil language tests and performance tests (McNamara, 2000).

In order to make a language test become effective, the language test items should meet principles of language assessment. Two of the principles are content validity and authenticity. Brown (2004) explains there are five criteria for testing a test namely practicality, reliability, validity, authenticity, and washback (p.19). a. Paper-and-pencil language tests


The example of paper-and-pencil language tests is National Examination since the test items in multiple choice formats. Besides, the aim of administering National Examination is to assess the students’ receptive skills. Receptive skills refer to the ability to receive information and understand it by listening or reading.

b. Performance tests

Nowadays, performance tests are well known as oral tests. Performance tests are different from paper-and-pencil language tests since they assess the act of communication McNamara (2000). Therefore, it is specifically used to assess speaking and writing skills.

3. Test Purposes

In term of test purposes, language tests are differentiated into five types. They are namely language aptitude tests, proficiency tests, placement tests, diagnostic tests, and achievement tests. Brown (2004) explains that by defining the purpose of testing, the test-designers will more focus on the certain objectives of the tests (p.42). Besides, it helps the test-designers to compose the items.

a. Language Aptitude Tests

According to Brown (2004), language aptitude test is “a test which is designed to measure capacity or general ability to learn a foreign language and ultimate success in that undertaking” (p.43). Language aptitude test is not

commonly employed since this test is used to predict one’s achievement in


reason is doubtful since the factors like appropriate self-knowledge, active strategic involvement in learning, and strategies-based instruction influence

somebody’s success. The examples of language aptitude tests are the Modern

Language Aptitude Test (MLAT) and Pimsleur Language Aptitude Battery


b. Proficiency Tests

Proficiency tests are not limited to one course, curriculum, or certain language skill. However, it tests all skills. Brown (2004) explains that proficiency tests consist of standardized multiple-choice items on grammar, vocabulary, reading comprehension, listening comprehension, writing skill, and sometimes oral production performance (p.44). In addition, McNamara (2000) states that proficiency tests are not correlated to the process of teaching since tests more refer

to the future ‘real-life’ language use as the criterion. Therefore, the score of this

kind of tests has gate-keeping function especially in the educational field and working area. An example of standardized proficiency tests is Test of English as a Foreign Language or well-known as TOEFL.

c. Placement Tests


program. An example of standardized placement tests is the English as a Second Language Placement Test (ESLPT) at San Fransisco State University.

d. Diagnostic Tests

Brown (2004) explains diagnostic tests are used to diagnose certain aspects of a language (p.46). In the practice, the test administrators have a checklist of features in order to point toward difficulties. The diagnostic test results help the teachers to decide on what aspects they have to focus. Besides, it provides information to the students to be aware of errors.

e. Achievement Tests

According to Brown (2004), an achievement test is limited to certain materials related to a curriculum within a particular time frame. An achievement test is used to determine whether the objectives of the course have been met by the end of an instruction period (p.47). Therefore, an achievement contributes in teaching learning process since it is related to classroom lessons, units, or curriculum. An example of achievement tests is National Examination.

4. Validity

Bachman (1990) explains that in order to make a test score becomes a

meaningful indicators to assess the individual’s ability, the test should concern

only to the ability which is expected to be tested (p.238). Bachman (1990) advances the validity of a test shows the quality of the test itself. When a test meets validity, consequently the test score effectively reflect the true condition of


indicators. However, in order to meet the validity, the test should reflect the skills or behavior which would be assessed. There are five types of validity to determine whether or not a test is valid namely face validity, content-related evidence (content validity), criterion-related evidence, consequential validity, and construct validity.

a. Face Validity

Gronlund (1998) states a test is considered having face validity if the students look the test as fair, pertinent, and utile for improving learning (as cited in Brown, 2004, p.26). Face validity itself refers to how the test looks good and it obviously appears to measure the skills which are going to be measured. Furthermore, according to Brown (2004) criteria of a test which has face validity are that the test is well-constructed, the test has the time allotment, the items are obvious and simple, the directions are clear, the tasks meet content validity, and the difficulty level presents a reasonable challenge (p.27).

b. Content Validity


Therefore, the scores of the test are effectively used as the meaningful indicators of students’ competence, for instance, a test for reading skills would be considered as a valid reading test if a test of reading measures reading skill and nothing else. The test is not a valid test for speaking or vocabulary because it does not test speaking or vocabulary. However, Seif (2004) claims it does not mean all educational objectives of a particular course are included in the test. Due to test practicality, the test designers should compose several questions which are able to be representatives of achieving the set educational goals. Seif (2004) claims content validity is one of essential parts to compose a test (as cited in Jandhagi and Shateria, 2008, p.2). As a test does not meet validity in its content, there will be two possible outcomes. First, students are not able to perform the needed skills which are not included in the test. Second, there may be some inappropriate questions which students are not able to answer. Therefore, the test tasks should be appropriate to the test specifications on the blueprints. It is similar to what Seif (2004) says, evaluating content validity of a test can be carried out by matching the sample of the test questions to the test instructions (as cited in Jandhagi and Shateria, 2008, p.2). Crocker and Algina (1986) advance that ‘matching method’ effectively ensure validity (as cited in Miller, 2003, p.12).


that test specifications include the general outlines of the test and the test tasks (p.50). The test specifications refer to a certain curriculum and it consists of only the general outlines of whole materials and skills to be tested since the test designers should consider test practicality.

c. Criterion-Related Evidence

Brown (2004) defines criterion-related evidence refers to the criterion of the test which is expected to be achieved. Criterion-related evidence validity is commonly categorized into two types namely concurrent validity and predictive validity (p.24). Criterion-related evidence is categorized into two types namely concurrent and predictive validity.

d. Construct Validity

Brown (2004) states construct validity has a big role in a test design. Furthermore, it is a main concern in validating large-scale standardized test of aptitude (p.25). It means that in making a test or testing a person, the test-designers or the examiners should adhere to practical procedures and principles. It is for example in determining the scoring criteria of a speaking test, the examiner should consider some factors such as pronunciation, accuracy, vocabulary use, and sociolinguistic appropriateness.

e. Consequential Validity

It refers to the consequences of a test. According to Brown (2004), a test raises various consequences, namely considerations as its accuracy in measuring intended criteria, its impact on the preparation of test-takers, its effect on the


interpretation and use (p.26). Besides, the effects of test preparation courses and manuals are the effect of consequential validity.

5. Authenticity

There are many different views of authenticity since some experts may define authenticity variously. Scarcella and Oxford (1992) define that authenticity refers to unedited and unabridged text (as cited in Day, 2003, p.4). While Widdowson (1976) emphasizes that authenticity is not only about the quality of a text at all but authenticity is reached when the readers understand the writer’s intention (p.264). Williams (1984) explains that authentic texts are written to convey a message (as cited in Day, 2003, p.4). It means that authentic texts’ purpose mainly is for communication.

According to Richards (2001) authentic materials are important to be applied in language teaching and learning. There are several benefits:

1.Since authentic materials are used for communication and exist in the real world. Therefore, authentic materials are considered as more interesting and motivated than created materials.

2.Since they are authentic, the materials are considered as having lots of appropriate information about the target language.

3.Authentic materials are composed not to illustrate some grammatical rules or discourse types. It resembles true language (pp. 252-253).


1.Authentic materials have difficult language and vocabulary in it which potentially distracts learners and teachers.

2.Using authentic materials burden teachers since the teachers should find the suitable ones for teaching. The teachers are not able to simplify the materials easily because it would be considered as the unauthentic ones (p.253).

In judging test items belong to the authentic ones, Bachman and Palmer (1996) claim that people cannot define the test items are authentic just by viewing it (pp.28-29). There are two kinds of authenticity in the case of National Examination test items; they are task characteristics and text characteristics. Task characteristics represent the authenticity of test instructions; therefore, the focus is on the test tasks (a set of test instructions and the provided options). In addition, text characteristics has essential role in order that test items meet authenticity. It represents the appropriateness of the passages used as the test materials.

There are several indicators to measure test items’ authenticity since

authenticity cannot be defined just by looking at it. According to Brown (2004), in order that a test meets criteria of authenticity, the test items should represent five ways such as the test language is as natural as possible, the items should be contextualized, the topics should be relevant to the learners, there are some thematic organizations to items, and the tasks represent the real-world tasks (p.28). These ways are utilized to analyze the authenticity of National Examination test items in case of the test tasks.

Natural language use refers to “the language of ordinary speaking and


linguistic facets such as typographical mistakes (in reading materials), lexis, morphemes, word orders and grammar (syntactic matters), diction, and meaning (semantic matters). The test tasks and test texts should resemble how natural the language is used as in the reality (Brown, 2004: 28).

The second indicator is the contextualization of the test items which means the test items are orderly organized into the same topics, for example, in a story line. The third indicator is relevance of the test topics and the learners, which means that the materials should be appropriate to learners’ ability. In some cases, many of authentic passages have difficult level of language which may burden language learners who have lower level of language. The statement is emphasized by Brown (2004) who explains that one of authenticity criteria is that the topics used in the test should be relevant to the learners (p.28) and added by Nutall (1996) who says the high-level-of-language texts are not suitable for improving or developing reading skills (p.177).


cited in Day, 2003, p.4). The statement indicates that authentic texts’ purpose is mainly for communication. It aims not focusing on teaching grammatical forms.

Brown (2004) adds in listening items authenticity points up dialogues or monologues spoken by native speakers which represent conversations happen in the real-life (p.28). It indicates that to achieve authenticity,natural language use is important such as in listening test there should be hesitations, white noise, and interruptions.

B.Theoretical Framework

A language test is a systematic method to measure one’s capability, knowledge, or performance in a certain domain in its relation with the language use. In order to meet usefulness of a language test, the test should meet a good

test’s criteria, for instance: reliability, validity, practicality, and authenticity

(Brown, 2004). Therefore, the language test should be high quality since it is a measurement of students’ capability. One of the types of language tests is English test of National Examination. McNamara (2000) states that in terms of methods, National Examination is a kind of paper-and-pencil language tests (written test). Paper-and-pencil language tests belong to receptive tests because they test

somebody’s receptive skills such as listening and reading skills. In terms of test


Competence, and test specifications which are incorporated in Graduate Competence Standard. In order to meet the usefulness as an assessing tool, language test such as National Examination should meet principles of language assessment. There are two criteria which the researcher focuses on, namely content validity and authenticity.

A valid listening test is a test where the content is composed based on the blueprints. If the topics are relevant with the test specifications, the listening test is valid (Brown, 2004). A valid reading test is a test where the content is composed based on the blueprints. If the topics are relevant with the test specifications, the reading test is valid. Content validity is important to be considered due to the effectiveness of the test. If a language test does not meet content validity it probably affects the students’ capability to perform the intended skill and the students are probably not capable to answer the test questions (Seif, 2004). Therefore, it is important to check content validity of language tests. In order to check the validity of language test, the test-designers or teachers are able to check it by matching the test items with the relevant test specifications and lesson objectives.


assessment, National Examination test-designers should consider two important parts of authenticity namely test task characteristics and test text characteristics (Bachman and Palmer, 1996).

Task characteristics include five aspects namely the naturalness of test language, the contextualized items represented in the test, the relevance of the test topics and the learners, the existence of some thematic organization items, and the representativeness of the world tasks (Brown, 2004). Those five aspects refer to the quality of the test tasks in reading and listening tests. The naturalness of test language in reading test items consists of linguistic aspects namely typography, lexis, morphology, syntax, and semantics. The naturalness of test language show the appropriateness of the test language to the target language.


Besides the test tasks, the test text characteristics become important in order to achieve authenticity and the text characteristics adapt the five indicators of test authenticity. There are three indicators used to check authenticity of reading texts namely the naturalness of test language, the relevance of the test topics and the learners, and the representativeness of the world tasks.



In this chapter, the researcher discusses the research methodology. It consists of six parts: research method, research participants, research instruments, data gathering techniques, data analysis techniques, and research procedures. Each part would be explained in details as follows.

A. Research Method

This research belongs to qualitative research since McRoy et al. (1988) defines qualitative research is a kind of research which is focused on non-statistical methods and analysis of social phenomena. Qualitative research uses detailed descriptions from the perspective of the research participants as means to examine specific issues and problems under study. It means that through qualitative inquiry, the researcher conducts analysis of the research participants on their natural setting without any manipulation on the data variable. According to Myers (1997), the data of qualitative research are in a form of descriptive data not in a form of numbers (as cited in Hunt, n.d., p.2).


intended to find out the validity of the content on English items of National Examination year 2012 for senior high schools and the authenticity of the English items of National Examination year 2012 for senior high schools. The data which are analyzed are not in the form of statistics or numbers but in a form of descriptive data as well. The forms of numbers are employed to describe the data in details.

One of the qualitative research methods is document analysis. According to Berelson (1947) document analysis is a systematic research technique to observe evidence of concepts from instructional documents (p.74). The types of the documents are various such as written document (public records, private papers, and biography), photograph, poster, map, artifact, motion picture, and sound recording. Since the researcher deals with documentation rather than examination and entails in-depth analysis of a set of collected data, the researcher applies document analysis method. In this research, the primary document which is analyzed is the five versions of English test of National Examination year 2012 for senior high schools.


authentic test.

B.Research Objects

The objects of this study are five sets of English test items of National Examination year 2012 for senior high schools, which were intended to be administered in Yogyakarta. The researcher got the copy of the test from an English teacher in SMA Negeri 7 Yogyakarta who had taught in SMA Bopkri 2 Yogyakarta. The samples of English test items of the National Examination year 2012 are for natural science and social science. Since both majors have the same lesson objectives which are incorporated in Competence Standard and Basic Competence, the test is same for both majors.

There are five versions for the test, namely A57, B69, C71, D43, and E45. Furthermore, the English test of National Examination is divided into two sections, they are listening and reading section. The listening section which was composed in only a version consists of fifteen questions starting from number one up to number 15 while the reading section of each test version consists of 35 questions starting from number 16 up to 50 for all five versions. Therefore, the total items on an English test of National Examination year 2012 for senior high schools items are 50 items of each test version.

C.Research Instruments


this research are the blueprints (Competence Standard and Basic Competence of Senior High School grades 10th up to 12th, and the Graduate Competence

Standard) and five sets of English items of National Examination year 2012 for senior high schools which were intended to be administered in Yogyakarta. Furthermore, the test specifications which are elaborated in the Competence Standard-Basic Competence and Graduate Competence Standard are developed into checklists. The checklists are employed as the instruments of this research to obtain the intended information. The checklists later are going to be used to check the validity and authenticity of the English test. The researcher becomes the research instrument as well since the data are compiled and analyzed are processed by the researcher, who constructs conclusions about what is regarded as data.

D.Data Gathering Techniques


Graduate Competence Standard year 2012 for senior high schools into checklists in order to obtain information of content validity on the English test items of National Examination year 2012 for senior high schools. The authenticity criteria were elaborated into checklists as well as to obtain information of the authenticity on the English test items of National Examination year 2012 for senior high schools.

The copies of English items of National Examination year 2012 for senior high schools, the document of Competence Standard-Basic Competence and the document of Graduate Competence Standard year 2012 for senior high schools were gotten from an English teacher in SMA Negeri 7 Yogyakarta. It was when the researcher was conducting teaching practice (Praktek Pengalaman Lapangan) in that school. The criteria of authenticity which were applied to obtain data in this research were taken from Brown (2004) in principles of language assessment.

E.Data Analysis Techniques


Before the English test content of National Examination year 2012 for senior high schools were compared with the blueprints, the test specifications of the blueprints were compared each other in order to ensure that the test specifications were valid to be the instrument to assess the validity on the English Test of National Examination year 2012 for senior high schools. The test specifications which were incorporated in Graduate Competence Standard year 2012 were the elaboration of the lesson objectives on Competence Standard and Basic Competence of English subject for senior high schools grades 10th up to 12th. Graduate Competence Standard contained some specific skills and topics which were provided in the National Examination. The Competence Standard and Basic Competence of English subject for senior high schools grades 10th up to 12th contained the competences and lesson objectives which the students were required to achieve.


checklist was employed to check the validity of the listening test items and reading test items compared with the test specifications on the blueprints. The last checklist was used to check the authenticity of the listening and reading test items. The techniques on analyzing the data were described as follows.

1. A checklist to check the validity of the test specifications

The checklist was employed to ensure whether or not the Graduate Competence Standard (Standard Kompetensi Lulusan) meets appropriateness to Competence Standard and Basic Competence of Senior High School grades 10th up to 12th.

Table 3.1 The Sample of Test Specification Validation Checklist

No. Competence Standard Basic Competence Graduate Competence



1. Transactional and

Interpersonal conversations 2. Transactional and

Interpersonal conversations in formal and sustained situation

3. Transactional and

Interpersonal conversations in formal and sustained situation

4. Short functional texts and simple monologue texts


5. Short functional texts and simple essays


taken from Competence Standard (four material topics for listening skills and one material topic for reading skills). The third column consisted of the lesson objectives and material topics taken from Basic Competence for the listening or reading skills (four material topics for listening skill and one material topic for reading skill). After the third column, there was the last column which was to put the Graduate Competence Standard (Standard Kompetensi Lulusan). In order to make the illustration clearer, the checklist could be seen in appendix four on pages 104 up to 105.

The checklists were filled by ticking () on the last column which represented the appropriateness between Graduate Competence Standard and both Competence Standard and Basic Competence. If one of the boxes on Graduate

Competence Standard’s column was not ticked, it meant that the criteria on

Competence Standard and Basic Competence of Senior High School grade XII were not stated in the Graduate Competence Standard (Standard Kompetensi Lulusan). Giving tick () on the boxes means that the criteria on Competence Standard and Basic Competence of Senior High School grade XII were stated in the Graduate Competence Standard (Standard Kompetensi Lulusan). After comparing the blueprints, the researcher later calculated the result. The final result was presented into percentage.

2. Checklists to check the content validity of English Items on listening and reading section of National Examination


kinds of checklists, first checklist was to check the listening test and the other was to check the reading items. The checklist of listening test consisted of eight points taken from the blueprints. There were three points were taken from Competence Standard and Basic Competence while the last five were taken from Graduate Competence Standard. Meanwhile, the checklist of reading test consisted of fifteen points taken from the blueprints; two points were taken from Competence Standard and Basic Competence while the last thirteen were taken from Graduate Competence Standard. Since there were five reading test versions, the checklists for reading section were copied into five. In order to make the illustration clearer, see appendix four on pages 107 up to 121. The researcher differentiated the checklists for listening section and reading section in order to facilitate the analysis process. The fifteen items of listening section and 35 items of reading section were analyzed. Each item should be matched with the criteria provided in the checklists.


the researcher calculated the results. The final results were presented into percentage. The results were differentiated into two, first was the result of checking content validity of both reading and listening items based on Competence Standard and Basic Competence then the second was the result of checking content validity of both listening and reading items based on Graduate Competence Standard. Since there were five reading test versions of the National Examination utilized in this research, there were five kinds of results which were presented into percentage for the reading section.

3. Checklists to check authenticity of English Items on listening and reading section of National Examination

The authenticity indicators utilized in this research were developed from

Brown’s theory (2004) about authentic language test. The researcher

differentiated the checklists used to assess the listening and reading test items. Since the listening items were provided in only one test version and the items were integrated with the passages, the researcher integrated the assessment for testing the test tasks and the test texts in one kind of checklist while the reading items were assessed in two kinds of checklists; the checklist to check authenticity of the test tasks of the reading items and the checklist to check authenticity of the test texts of the reading items.


representativeness. Since the representativeness of thematic items organization should not be presented in each item, it was not an obligatory to fulfill this indicator. If the set of listening items had representativeness for this indicator, therefore the set of listening items was considered fulfilling this indicator.

Table 3.2 The Sample of Authenticity Checklist

Test item number The Criteria of Authenticity

Natural language use Contextualized Items The relevant topic

Episodic item organization Real-world representativeness Natural language use

Contextualized Items The relevant topic

Episodic item organization Real-world representativeness Natural language use

Contextualized Items The relevant topic

Episodic item organization Real-world representativeness

This kind of checklists was filled by giving tick () on the provided column. The presence of the tick () meant that the listening test materials of National Examination reflected the indicator(s) of authenticity while the absence of the tick () on the provided criteria meant that the listening test materials of National Examination did not reflect the indicator(s) of authenticity. After fulfilling the checklists and getting the data, the researcher calculated the results. The final results were presented into percentage.


each reading test versions since the researcher found some similar passages in some test versions. After selecting the reading passages, the researcher finally found 50 different kinds of reading passages with 123 test instructions. The researcher employed two kinds of checklists in order to analyze the authenticity of English reading test tasks of National Examination. They are checklists to check the reading tasks and to check the reading texts and they were taken from

Bachman and Palmer’s theory.

First checklist was employed to check the reading test tasks. It referred to the reading test instructions and the optional answers. The sample of the checklist was almost similar to the checklist of the listening section (see appendix four on pages 122 up to 137). The authenticity indicators utilized in the checklists were: the natural language use, contextualized items, the relevance between the test topics and the learners, the representativeness of thematic items organization, and real-world representativeness. Since the representativeness of thematic items organization should not be presented in each item, it was not an obligatory to fulfill this indicator. If the set of reading test tasks had representativeness for this indicator, therefore the set of reading test tasks was considered fulfilling this indicator.


checklists and getting the data, the researcher calculated the results and the final results were then presented into percentage.

Afterwards, the second checklist was utilized to check the test texts of the reading section. The passages which were analyzed are the 50 passages with 132 test instructions. The sample of the checklist employed to check the reading texts on the English test items could be seen in the table 3.3 (or see in appendix four on pages 138 up to 141).

Table 3.3 The Sample of Authenticity of the Test Text Checklist

Test passages The Criteria of Authenticity

Natural language use The relevant topic

Real-world representativeness Natural language use

The relevant topic

Real-world representativeness Natural language use

The relevant topic

Real-world representativeness


results were presented into percentage.

F. Research Procedures

In conducting the research, the researcher follows the ten steps for conducting document analysis based on Fraenkel and Wallen (2008) on their book pages 475-479. The research procedures were explained in details. The procedures are described as follows.

1. Determining Objectives

Since the researcher intended to analyze the documents of English items of National Examination, the researcher determined the problem formulations and the objectives of the research. The objectives of the research were to obtain information on the content validity and the authenticity of the English test items on senior high school National Examination year 2012. The information were obtained deeper and details.

2. Defining Terms

The research gained the information related to the validity and authenticity of language testing, the research method, and the research instruments. After gaining information, the researcher ensures the definitions of the related terms are employed in this research either beforehand or as the study progresses. The definitions of the terms which were employed in this research were taken from books, journals, e-books, and e-journals.

3. Specifying the units of analysis


the content validity and authenticity of English test items of National Examination year 2012 for senior high schools which were administered in Yogyakarta. The English test items of the National Examination which were analyzed and discussed were listening and reading test items. By specifying the unit of analysis, the research analysis and research discussion would not be broadened and generalized.

4. Locating Relevant Data

After determining the objectives and specifying the units of analysis, the researcher determined the relevant data. Data which were going to be used for this research were the documents of Graduate Competence Standard year 2012 for senior high schools, the document of Competence Standard-Basic Competence of English subject for senior high schools grades 10th up to 12th, and the documents of English items of National Examination year 2012 for senior high schools administered in Yogyakarta. Besides, the researcher used several checklists to analyze the data.

5. Developing a Rationale


6. Developing a Sampling Plan

The researcher employed the English test items of National Examination year 2012 for senior high schools which were administered in Yogyakarta, particularly in SMA Bopkri 2 Yogyakarta. There was a listening test version and there were five reading test versions, namely A57, B69, C71, D32, and E45. In the preliminary data, the reading test items were selected and grouped since there were found several similar test tasks and test texts on several test versions. In the analysis, there were only 123 test tasks and 50 test texts analyzed.

7. Formulating Coding Categories

The categories were composed in form of checklists’ criteria of content validity and authenticity. The criteria of content validity were based on the blueprints and the criteria of authenticity were based on Brown’s theory. The analysis of reading test items was divided into two parts; they are the analysis of the test tasks and the analysis of the test texts.

8. Checking Reliability and Validity of Test Specifications


related to researcher’s analysis and the data analysis.

9. Analyzing Data

The researcher analyzed the data in two sections. Initially, the researcher conducted an analysis to measure the validity of English test items of National Examination year 2012 for senior high schools. After that, the researcher conducted analysis to measure the authenticity of English test items of National Examination year 2012 for senior high schools. The authenticity analysis was elaborated into two kinds of analysis namely the analysis of test task authenticity and the analysis of the test text authenticity.

10. Triangulation

In order to ensure that the research data and the findings of this research are valid, the researcher verified some samples of the data to the educational experts. There were four experts (lecturers) who were as the respondents and verified the research data which were related to authenticity of the test tasks and the test texts of the test items on National Examination. The instruments which were going to be employed in order to obtain data triangulation are open-ended questionnaires. The questionnaires consisted of nine random test items from each reading test versions and the research analysis correlated to the test items which were included in the questionnaires.

11. Reporting





In this chapter, the results of the research will be explained and the results

reveal the findings of this research. This research aimed to evaluate a set of

English items of National Examination year 2012 in case of its content validity

and authenticity. It is important to discuss because National Examination is the

most important standardized test to evaluate the students’ ability and knowledge.

Therefore, the test items should qualify criteria of valid and relatively more

authentic tests. The researcher got the data from analyzing the set of English items

using checklists in order to find that the set of English items is valid and relatively

more authentic. The checklists contain criteria of a valid and relatively more

authentic test. Moreover, the data triangulation of this research analysis shows

that the data findings on this research are considered as reliable and valid.

A. Content Validity of 2012 English Test Items in National Examination for Senior High Schools

The research results, which were obtained, were divided into two parts.

First part is the analysis results of content validity of listening and reading test

items on National Examination year 2012 based on Competence Standard and

Basic Competence. The second one is the analysis results of content validity of

listening and reading test items on National Examination year 2012 based on


Table 3.1 The Sample of Test Specification Validation Checklist Competence Standard Basic Competence Graduate
Table 3.2 The Sample of Authenticity Checklist
Table 3.3 The Sample of Authenticity of the Test Text Checklist
Table 4.1 The Percentages of Validity and Authenticity of the Test ItemsNo


Dokumen terkait

The subjects of this study are the content and the compatibility of the materials in “ Look Ahead an English Textbook for Senior High School Grade X ”..

This study is carried out to describe the types of reading comprehension question in English National Examination of Senior High School in 2011-2015.It aims to describe: (1) the

The researcher focuses on analyzing the content validity of Regional English test items of grade eight students of junior high school in Karanganyar region 2009/2010 academic

Sulistyani Hidayati, Types of Reading Comprehension Questions of English National Examination for Senior High School Students. Magister of English Language Study. Post

second research emphasized on analyzing the multiple-choice items of the reading test in Senior High School National Examination that the purposes of the research

Cultural Content Analysis Of Two English Textbooks For Senior High School Universitas Pendidikan Indonesia | repository.upi.edu | perpustakaan.upi.edu.. CULTURAL CONTENT ANALYSIS

Wiratmo, Bonaventura Dono. An Analysis of Face and Content Validity of Reading Section of the 2007 National Final Examination for Vocational High Schools. Yogyakarta: English

focused on the difficult items and also the components of the Listening Section of English National Examination which are indicated difficult based on the difficult items