• Tidak ada hasil yang ditemukan

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji joeb.81.4.215-220

N/A
N/A
Protected

Academic year: 2017

Membagikan "Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji joeb.81.4.215-220"

Copied!
7
0
0

Teks penuh

(1)

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=vjeb20

Download by: [Universitas Maritim Raja Ali Haji], [UNIVERSITAS MARITIM RAJA ALI HAJI

TANJUNGPINANG, KEPULAUAN RIAU] Date: 12 January 2016, At: 18:00

Journal of Education for Business

ISSN: 0883-2323 (Print) 1940-3356 (Online) Journal homepage: http://www.tandfonline.com/loi/vjeb20

Acceptance and Accuracy of Multiple Choice,

Confidence-Level, and Essay Question Formats for

Graduate Students

Stephen M. Swartz

To cite this article: Stephen M. Swartz (2006) Acceptance and Accuracy of Multiple Choice, Confidence-Level, and Essay Question Formats for Graduate Students, Journal of Education for Business, 81:4, 215-220, DOI: 10.3200/JOEB.81.4.215-220

To link to this article: http://dx.doi.org/10.3200/JOEB.81.4.215-220

Published online: 07 Aug 2010.

Submit your article to this journal

Article views: 38

View related articles

(2)

ABSTRACT. The confidence level

(information-referenced testing; IRT)

design is an attempt to improve upon the

multiple choice format by allowing students

to express a level of confidence in the

answers they choose. In this study, the

author evaluated student perceptions of the

ease of use and accuracy of and general

preference for traditional multiple choice,

confidence-level, and essay format

ques-tions. The author estimated the relative

accuracy of traditional multiple choice vs.

confidence level compared with the essay

results and the student self-reported

mas-tery of knowledge domains. Student

accep-tance of the new format was equal to, and

accuracy was better than, the traditional

format.

Copyright © 2006 Heldref Publications

he assessment of student learning is an important issue for educa-tors. The history of the development of assessment tools and techniques indi-cates a high level of emphasis on the accuracy and efficiency of testing meth-ods (Madaus & O’Dwyer, 1999). Since the early 1900s, traditional multiple choice (MC) item formats have achieved a position of dominance in learning assessment, mainly due to the prima facie objectivity and the efficiency of administration this format represents. However, the popularity of the MC for-mat has come under scrutiny for some applications where accuracy of assess-ment, particularly for complex knowl-edge domains, has greater importance than efficiency (Becker & Johnston, 1999; Bennett, Rock, & Wang, 1991). Traditional MC testing formats offer efficiency, objectivity, simplicity, and ease of use for the assessment of student knowledge, but are subject to many sources of interpretation error. Essay format questions, while inefficient and difficult to grade objectively, offer a potentially higher level of information quality.

Purpose

The purpose of this study was to eval-uate gradeval-uate student perceptions of the ease of use and accuracy of and general preference for traditional MC,

confi-dence level (CL), and constructed response (CR; essay or short answer) format questions. I also made a compar-ison estimating the relative accuracy of traditional MC versus CL against both the CR results and the student self-reported posttest mastery of knowledge domains.

Literature Review and Related Research

Knowledge Assessment: CR Versus MC

Educators have sought a better com-promise between the richness and depth of the CR format and the simplicity and efficiency of the MC format. Classroom testing procedures are used to assess a range of student attributes from simple right or wrong recall of factual material to the demonstration of synthesized knowledge applied correctly to new or unique problems. Testing procedures can be classified roughly into two sets: CR and MC formats (Haladyna, 1999). The CR format includes the assessment of student attributes through critiques, demonstrations, essays, experiments, interviews, oral reports, portfolios, jects, and research papers. CR tools pro-vide students with prompts, and stu-dents are required to construct responses. MC format tools generally present students with a prompt, then offer alternatives from which the stu-dents choose the correct response. True

Acceptance and Accuracy of Multiple

Choice, Confidence-Level, and Essay

Question Formats for Graduate Students

STEPHEN M. SWARTZ

UNIVERSITY OF NORTH TEXAS DENTON, TEXAS

T

(3)

or false, matching, and the traditional MC (i.e., selecting from among alterna-tive responses) are all forms of the MC category. It has become generally accepted that trade-offs exist between CR and MC measurement tools. While CR formats are more difficult to admin-ister and evaluate objectively and pre-cisely, they provide the opportunity to assess more complex student attributes and higher levels of attribute achieve-ment (Conderman, 2001; Powell, 1989). By contrast, traditional MC formats are easy to administer and use and provide inherent objectivity in grading, but they measure only superficial binary out-comes and promote rote learning (Miller, Williams, & Haladyna, 1978; Rogers & Ndalichako, 1997). Also, the traditional MC formats are unable to distinguish between right answers resulting from students knowing the answer and those resulting from stu-dents guessing the answer (Rogers & Ndalichako, 2000). While a variety of approaches have been tried in an effort to improve MC formats, including the addition of essay questions linked to MC questions (Wood, 1998), many of these compromises could be considered as adjunct approaches in addition to the MC format and do not represent direct improvements on the MC format itself.

Issues in MC Formats

All MC questions consist of a stem or prompt (the question) and several alter-native responses. The alteralter-native responses generally include a single correct response and multiple plausible, but incorrect, choices (Hansen, 1997). If the purpose of the examination instru-ment is to measure the level or amount of knowledge in a domain, this format reflects only the binary outcomes of stu-dents knowing or guessing the correct answer versus students not knowing the answer or guessing incorrectly.

One way to add additional precision to MC format items is to allow for more than one correct answer and offer a range of credit for more complete ver-sus less complete responses. Pomplun and Omar (1997) reported on the use of such multiple-mark MC format items. With this format, multiple correct responses are offered and students are

directed to select every correct response. Full credit is offered for a per-fect selection and varying levels of par-tial credit are given for less than perfect selections.

This format appears to offer two main advantages over traditional MC formats (Pomplun & Omar, 1997). First, admin-istrators believe that the multiple-response format is more realistic because, for many knowledge domains, more than one right answer naturally exists. Only infrequently does a single right answer for many knowledge areas exist. Second, the method is believed to reduce the bias introduced by guessing. By giving students a more exhaustive list of choices, the likelihood that at least some of the choices fall into the students’ knowledge base is higher. Finally, the multiple-response option is easily accommodated into the existing bubble sheet optical reader technology already in use.

Another critical issue in designing, using, and interpreting traditional MC format instruments is the question of the optimal number of choices offered. While the multiple-response format would require the use of many alterna-tives (several right and several wrong choices are required), the use of three through five alternatives for binary-outcome measurement has become a de facto standard. However, research sug-gests that the use of three choices may be superior to any larger number. As early as 1964, Tversky showed that, given a fixed number of choices, the use of three alternatives actually maximized the discriminability and statistical power of the instrument. In 1994, Sidick, Barrett, and Doverspike investi-gated the use of three versus five choice items used in public sector employment tests. They concluded that the psycho-metric properties of the three alternative MC items were comparable to the five choice items, making the potential development and administration sim-plicity gains preferable. In 1995, Bruno and Dirkzwager applied the information theoretic perspective to this problem of optimal number of choices in MC for-mat. The starting assumption was that the amount of information extracted from a test item will increase with the number of offered choices but that this

is not a perfectly linear relationship because the marginal increase in infor-mation extracted tapers as the number of choices increases. Indeed, too many choices begin to introduce a certain amount of distraction, because equally informed students may select different marginal choices from a large number of alternatives (Bruno & Dirkzwager). The researchers derived a formula rep-resenting the amount of information per alternative reflected in the number of options and found the optimal whole number of three choices to yield the maximum amount of information pro-vided per choice, which was considered to be ideal. Rogers and Harley (1999) yielded similar findings. Educators reported that, in many instances, the development of a fourth alternative often resulted in writing a throwaway choice that added no value to the item. Also, the information gained from three choice items was at least equivalent to four choice items, and the bias induced by guessing (test-wiseness) was reduced (Rogers & Harley).

The literature presented seems to sug-gest that essay questions are preferred for the amount of information about stu-dent knowledge they provide, particu-larly in terms of the ability to assess dimensionality of knowledge beyond simple right versus wrong determina-tions. MC questions are efficient to administer and evaluate and reduce potential evaluator bias, but are inferior in their ability to measure multiple dimensions of knowledge. By adding additional choices, the ability to dis-criminate between levels of knowledge is improved, but, for many applications, a smaller number of choices is pre-ferred.

Multidimensional Testing With CL

The inclusion of the dimension of rel-ative certainty to the existing dimension of rightness provides useful information for the educator (Hassman & Hunt, 1994). The measurement of students’ confidence in their answers, combined with whether the answer is correct, both reduces the guessing effect and provides some diagnostic feedback to the learn-ing process. The development of infor-mation-referenced testing (IRT), or CL

(4)

testing (Bruno, 1986; Bruno, Holland, & Ward, 1988), allows this kind of mea-surement.

The IRT format proposes to capture the dimension of student certainty in the answer selected. The advantage to the educator is that, by taking student confi-dence in the answer into account, inter-mediate assessment between fully informed students (i.e., students who know and are confident in the correct answer) and misinformed students (i.e., students who are confident in their choice, but answer incorrectly) can be achieved (Bruno, 1986) by offering choices in three levels with each level representing a different degree of confi-dence. An example of this question for-mat follows:

1) 1 + 2 = ? A. 2.717 D. A or B G. I don’t know B. 3 E. B or C

C. 3.141 F. A or C

At the first level, three alternatives are presented, with one correct and two incorrect choices (e.g., choices A, B, C). By choosing an alternative at this level, students exhibit a high level of confi-dence in their knowledge. By selecting the right answer, students demonstrate that they are fully informed and confi-dent in their knowledge. The correct response (B in the example given) would be graded at full credit. By choosing a wrong answer at this level (A or C), the student is demonstrating that he or she is confident in the wrong knowledge and is, therefore,

misin-formed. At this point, an incorrect response would be given zero credit.

The second level of alternatives (e.g., D, E, F) presents Boolean “or” choices among alternative combinations of the first-level options. By selecting a choice at this level, students demonstrate that they are either partially informed by selecting a correct choice (in the exam-ple, both D and E include the right answer) or misinformed by selecting the wrong choice (F in the example). Cor-rect answers at this level would be awarded half credit. At this level, stu-dents trade half of the available credit to avoid the risk of being forced to choose between two of the three alternatives, which would earn them a 50% score for random guessing. Wrong choices at this level are again scored zero credit.

Finally, at the third level, students are afforded the opportunity to admit to being uninformed (e.g., choosing G in the above example) and possessing a lack of knowledge. Here, the student is rewarded with one third of the credit, which represents the fair value of attempting random guessing from among any of the three first-level choic-es. The feedback quality for the educa-tor as a result of analyzing student responses along this spectrum allows for a wider range of (and more appro-priate) pedagogical responses. The IRT response model is summarized in Table 1 (from Larson, 2003).

Although the additional information provided by the implementation of IRT

could be of value to the educator, concern may exist regarding the implementation cost. First, there are direct costs associat-ed with securing seven-item bubble sheets, making changes to associated software for analyzing results, and mak-ing changes to the testmak-ing process. How-ever, the greater concern could be whether or not students accustomed to the traditional MC format would accept the more complicated (and perhaps diffi-cult to understand) format and be able to perform comfortably on instruments with items of this type. The “costs” associated with the CL or IRT format may outweigh the additional information provided to the educator.

METHOD

In this study, I attempted to answer two research questions:

1. How do graduate students perceive the relative ease of use and measure-ment accuracy of and general prefer-ence for traditional MC, IRT, and essay or short-answer formats for assessing student knowledge?

2. Which MC format (traditional vs. CL) provided better accuracy in terms of association with self-reported mas-tery and the answers to CR (essay or short answer) format questions?

I was interested in both the accep-tance level and accuracy of the proposed CL format to more thoroughly assess its suitability for use in the classroom.

TABLE 1. Summary of Information-Referenced Testing Model

Credit

Student action Root cause Diagnosis earned Pedagogical response

Chooses correct option from first level

Chooses correct option from second level

Chooses “I don't know”

Chooses incorrect option from first or second level

None.

Adjust the scope of instruction and study to “fill in the gaps.”

Cover the material again fully and increase confidence.

Reevaluate learning; use alternative methods of instruction to correct the problem. 1.0

0.5

0.3

0.0 Student is “fully informed.”

Student is “partially informed.”

Student is “uninformed.”

Student is “misinformed.” Student confidently

comprehended the objective.

Student is not confident or comprehends only part of the objective.

Student cannot answer the test item.

Student is confident, but wrong.

(5)

To test student perception of the three question formats, I surveyed two groups of students on the ease of use of, accu-racy of measurement of, and general preference for traditional MC, CL, and CR questions. The two groups included students from a Master of Business Administration (MBA) program at a private Midwest college and students from a Master of Science (MS) program at a government-run graduate school. The two groups took very similar (e.g., same textbook, same professor, same exams) sections of a graduate Introduc-tion to Supply Chain Management course. The sections were the same size (18 students) and were very similar in demographic composition. The MBA students were slightly older, had a wider range of industry experience, and attended the class at night once a week. The MS students were 3–5 years younger, on the average, of a very simi-lar range of experiences, and taking the class during the day, twice a week, as part of a full-time cohort.

Three exams were administered dur-ing each course (in addition to case work and student projects) and each exam contained a mix of MC, CL, and CR questions from the same knowledge domains. Students received familiariza-tion training on the CL format prior to the first exam, consisting of a complete description 2 weeks before the exam, and another presentation including a practice quiz the week prior to the exam. For Exam I, the questions were tightly coupled in that I maintained the question wording as identical as possi-ble among the three formats. Exams II and III had reduced coupling, so that by Exam III the questions were recogniz-ably different, but they were from the same knowledge domain as much as possible. Each exam consisted of 10 knowledge domains, measured by 10 MC, 10 CL, and 5 CR questions. There-fore, 5 of the knowledge domains were tested across all three formats. Immedi-ately prior to and following the exam, students were asked to self-evaluate their knowledge in each of the tested domains. In addition, immediately fol-lowing each exam, the students were asked to evaluate the overall ease of use of, measurement accuracy of, and gen-eral preference for each format. Ease,

Accuracy, and Preference were evaluat-ed on a 7-point Likert scale ranging from 1 (stongly disagree) to 5 (strongly agree) requiring students to respond to statements such as, “Multiple choice questions were easy to understand and use.” I assessed student knowledge rep-resented through the three formats (i.e., MC, CL, and CR) and captured the data in the same SPSS (version 12.0.2 for Windows, Chicago, IL) dataset.

RESULTS

To address the research questions, I performed several statistical analyses. First, I used difference of means tests to determine to what degree student pref-erences between the three formats were different or separable. I ran these tests for all groups combined, then ran them again across the demographic categories for MS versus MBA, and finally for Exams I, II, and III. I also ran a regres-sion model to assess whether the vari-ous demographic factors had an effect on student preference. The second analysis involved the use of correlation models to measure degree of association between the knowledge as represented by the two MC formats (traditional and CL) and assumed true values as repre-sented by the essay (CR) answers and self-reported mastery.

Preference

The students demonstrated a strong, consistent preference for the CR for-mat on all three dimensions of ease of use, measurement of accuracy, and lik-ability (see Table 2). Paired-difference

ttests were run in SPSS (version 12.0

for Windows). Preferences were orga-nized from most preferable to least preferable and grouped by whether there were statistically significant dif-ferences (α< .10) between the means.

Table 2 (results for all groups and exams) shows that for ease of use, CR was statistically significantly preferred over both MC and CL, which could not be separated statistically from each other. General likability had similar results. For measurement accuracy, CR was still superior to both MC and CL, but CL was statistically separable from MC as well.

Next, I ran the paired difference t

tests within the subgroups of the MS students and MBA students separately. The results were very similar between the two groups. For ease of use, students in both groups preferred the CR format and were indifferent in their preference for MC and CL (not statistically signifi-cant). For measurement accuracy, the MS group favored CR over CL and CL over MC, while the MBA group pre-ferred CR over both CL and MC. Results for the overall preference (lika-bility) were identical, which suggests that the type of program may have some effect on students’ perceptions of testing formats.

The third set of difference of means tests contrasted the results between the three exams. Note that two separate effects are being picked up by this con-trast: The coupling of the question for-mats (identical vs. similar questions) and the learning effect over time as stu-dents develop familiarity with each suc-cessive test. No attempt was made to directly account for these potentially interacting effects; however, ease of use

TABLE 2. Paired t-Test Results for Master of Business Administration and Master of Science Students’ Preferences for Question Format: Overall Ease, Accuracy, and Likability

Question format

Constructed Multiple Confidence

Measure response choice level

Ease of use 5.61+ 5.03 4.98

Measurement accuracy 5.53+ 4.52 4.74

Likability 5.02+ 4.03 4.16

+p< .10.

(6)

alone could be considered a surrogate measure for increasing familiarity between the three exams. If ease of use scores for CL increase over repeated tests, this should indicate changes in the learning effect.

The results appeared consistent across the three exams. For ease of use, students consistently preferred CR over MC and CL, while no statistically sig-nificant differences existed between MC and CL. An interesting finding is that the means decreased over time for the CR and CL formats. I found similar results for measurement accuracy. For overall likeability, the first and second exams reflected the same CR over CL and MC pattern as ease of use. Howev-er, for the third exam, the CL preference increased to statistically tie with CR, so two groups (CR and CL; CL and MC) were formed.

Overall, the results of the difference of means tests indicated a fairly consis-tent student preference for CR format questions on all three criteria. This pat-tern did not change according to type of program (i.e., MS or MBA) or across the three exams. The CL format seemed fairly even with (statistically insepara-ble from) the MC format, except for the MS group and for the third exam, where it seemed to have some desirability over MC in terms of measurement accuracy and overall likeability. A regression model predicting preference with either MBA versus MS or degree of coupling provided neither statistical nor practical significance, supporting the results of the difference of means tests.

Accuracy

I performed biserial correlations (both parametric and nonparametric) to assess the relationships between the question formats (MC and CL) and some presumed true value of knowl-edge. The answers to CR items and stu-dents’ posttest self-reported mastery (PoSM) represented the presumed true value of knowledge in the individual domains tested. All relationships were found to be statistically significant at the α< .05 level (see Table 3).

Using PoSM as a surrogate for true knowledge, we see that the traditional MC format has a slight advantage over

CL using both parametric (.198 vs. .109) and nonparametric (.215 vs. .186) correlations. However, when the CR answers are used as a surrogate for mea-suring true knowledge, the opposite is true. CL shows an advantage over MC by a slightly wider margin on both para-metric (.297 vs. .165) and nonparamet-ric (.378 vs. .146) tests of association. This difference in result could be explained by several theories. The PoSM measure of true knowledge is subject to much bias and potentially overlaps with student preference, while the CR measure is not subject to student (self-report) issues. Because CR responses have prima facie validity and general acceptance in the literature as being the best measure of true knowl-edge, I preferred to use this measure. The findings, therefore, indicate improved accuracy of the CL format over the MC.

Efficacy

Another interesting question involves the assessment of student perceptions of

student–instructor competence in learn-ing or teachlearn-ing in the individual knowl-edge domains. Table 4 presents the results of biserial correlations between student-reported pretest subject mastery (PrSM), posttest self-reported mastery (PoSM), pretest instructor performance (PrIP), and posttest instructor perfor-mance (PoIP).

The results indicated strong agree-ment between pre- and posttest subject mastery as reported by students (PrSM–PoSM at .618). Similar strong agreement existed between pre- and posttest instructor performance (PrIP– PoIP at .648). However, slightly less agreement existed between instructor performance and subject mastery. Both before and after taking the exam, IP and SM associated at relatively high levels (pretest at .513; posttest at .593). Final-ly, it is interesting to note that the asso-ciation between instructor performance and true knowledge (represented by CR) was statistically significant for either pre- or posttest measures (not shown in Table 4).

TABLE 3. Pearson Parametric (r) and Spearman Nonparametric (rs) Correlations for Accuracy of Multiple Choice and Confidence Level Testing

r

rs 1 2 3 4

1. Multiple choice — .249a .198a .165a 2. Confidence level .356a .109b .297a 3. Self-reported mastery .215a .186a .190a 4. Constructed response .146c .378a .159c

= .000; = .039; = .002.

*p< .05.

TABLE 4. Pearson Parametric (r) and Spearman Nonparametric (rs) Correlations for Pretest Versus Posttest Knowledge Assessment

r

rs 1 2 3 4

1. Pretest subject mastery — .618a .513a .400a 2. Posttest subject mastery .589a .376b .593a 3. Pretest instructor performance .509a .366a .648a 4. Posttest instructor performance .391c .621a .613c

= .000; = .039; = .002.

(7)

DISCUSSION

The overall research questions guiding this investigation were “How do graduate students perceive the relative ease of use of, accuracy of the measure of, and gen-eral preference for traditional MC, IRT, and CR formats for assessing student knowledge?” and “Which MC format (traditional vs. CL) provided better accu-racy in terms of association with self-reported mastery and the answers to CR (essay) format questions?” Both the acceptance level and relative accuracy of the proposed CL format to more thor-oughly assess its suitability for use in the classroom interested me.

Student feedback indicated a strong preference for CR format questions across all three criteria and a lower pref-erence for traditional MC and CL for-mats. These preferences do not seem to be dependent on type of program (i.e., MS or MBA) or the degree of coupling of the questions (high, medium, or low). Students demonstrated consistent ambivalence between traditional MC and the proposed IRT formats.

However, student preference is only one factor for educators to consider when employing a testing format. The accuracy of the assessment instrument and the value of the information pro-vided are also important considera-tions. Evidence indicates that the CL format offers advantages over tradi-tional MC in terms of measurement accuracy, as reflected in both student opinion and comparison with CR items. In addition, the information pro-vided by the CL items offers a peda-gogical advantage over traditional MC formats in the quality and richness of

feedback provided. The results of this initial study suggest that, among these students, acceptance of the CL format was at least as high as the level of acceptance for the traditional MC for-mat and is not an adoption considera-tion. The decision to employ the CL format then depends on the trade-off between the value of the improved information and accuracy and the administrative burden of rewriting questions and modifying procedures.

NOTE

Correspondence concerning this article should be addressed to Stephen M. Swartz, Assistant Pro-fessor of Logistics Management, Department of Marketing and Logistics, University of North Texas, Denton, TX. E-mail: swartzs@unt.edu

REFERENCES

Becker, W. E., & Johnston, C. (1999). The rela-tionship between multiple choice and essay response questions in assessing economics

understanding. The Economic Record, 75,

348–357.

Bennett, R. E., Rock, D. A., & Wang, M. (1991). Equivalence of free-response and multiple choice items. Journal of Educational Measure-ment, 28(1), 77–92.

Bruno, J. E. (1986). Assessing the knowledge base of students: An information theoretic approach

to testing. Measurement and Evaluation in

Counseling and Development, 18, 116–130. Bruno, J. E., Holland, J. R., & Ward, J. W. (1988).

Enhancing academic support services for spe-cial action students: An application of informa-tion referenced testing. Measurement and Eval-uation in Counseling and Development, 21(1), 5–13.

Bruno, J. E., & Dirkzwager, A. (1995). Determin-ing the optimal number of alternatives to a mul-tiple-choice test item: An information theoretic

perspective. Educational and Psychological

Measurement, 55, 959–966.

Conderman, G. (2001). Program evaluation: Using multiple assessment methods to promote authentic student learning and circular change.

Teacher Education and Special Education,24, 391–394.

Haladyna, T. M. (1999). Developing and

validat-ing multiple-choice test items. Mahwah, NJ: Lawrence Erlbaum Associates.

Hansen, J. D. (1997). Quality multiple-choice test questions: Item-writing guidelines and an analysis of auditing testbanks. Journal of Edu-cation for Business, 73, 94–97.

Hassman, P., & Hunt, D. P. (1994). Human self-assessment in multiple-choice testing. Journal of Educational Measurement, 31, 149–160. Larson, E. D. (2003). An analysis of information

referenced testing as an air force assessment tool. Unpublished master’s thesis, Air Force Institute of Technology, Dayton, OH. Madaus, G. F., & O’Dwyer, L. M. (1999). A short

history of performance assessment: Lessons learned. Phi Delta Kappan, 80, 688–695. Miller, H. G., Williams, R. G., & Haladyna, T. M.

(1978). Beyond facts: Objective ways to mea-sure thinking. Englewood Cliffs, NJ: Educa-tional Technology Publications.

Pomplun, M., & Omar, M. D. H. (1997). Multi-ple-mark items: An alternative objective item

format? Educational and Psychological

Mea-surement, 57, 949–962.

Powell, J. L. (1989). How well do tests measure real reading? Bloomington, IN: ERIC Clearing-house on Reading and Communication Skills. (ERIC Document Reproduction Service No. ED 306552)

Rogers, W. T., & Harley, D. (1999). An empirical comparison of three and four choice items and tests: Susceptibility to testwiseness and internal consistency reliability. Educational and Psy-chological Measurement, 59, 234–247. Rogers, W. T., & Ndalichako, J. (1997).

Compari-son of finite state score theory, classical test theory, and item response theory in scoring

multiple-choice items. Educational and

Psy-chological Measurement,57, 580–589. Rogers, W. T., & Ndalichako, J. (2000).

Number-right, item-response, and finite-states scoring: Robustness with respect to lack of equally clas-sifiable option and item option independence.

Educational and Psychological Measurement, 60, 5–9.

Sidick, J. T., Barrett, G. V., & Doverspike, D. (1994). Three-alternative multiple choice tests: An attractive option. Personnel Psychology, 47, 829–835.

Tversky, A. (1964). On the optimal number of alternatives of a choice point. Journal of Math-ematical Psychology, 1, 386–391.

Wood, W. C. (1998). Linked multiple-choice questions: The tradeoff between measurement accuracy and grading time. Journal of Educa-tion for Business, 74, 83–86.

Gambar

TABLE 1. Summary of Information-Referenced Testing Model
TABLE 2. Paired t -Test Results for Master of Business Administration and
TABLE 3. Pearson Parametric (r ) and Spearman Nonparametric (r s)

Referensi

Dokumen terkait

[r]

PANITIA PENGADAAN BARANG / JASA DILINGKUNGAN DINAS PEKERJAAN UMUM BINA MARGA KABUPATEN PROBOLINGGO.

Alternatif Strategi merupakan hasil perbandingan penulis terhadap kekuatan, kelemahan, peluang dan ancaman yang dimiliki Bank Syariah Mandiri.. Setelah diketahui

Jenis penelitian yang digunakan dalam penulisan skripsi ini adalah deskriptif kuantitatif, yaitu penelitian ini banyak dituntut menggunakan angka, mulai dari pengumpulan

Akumulasi Logam Berat Tembaga (Cu) dan Timbal (Pb) pada Pohon Avicennia marina di Hutan Mangrove.. Departemen Kehutanan Fakultas Pertanian Universitas Sumatera

Dengan semakin besarnya data yang akan siolah pada sistem administrasi sekolah, maka diperlukan sebuah sistem yang mampu mengolah data yang besar. Salah satu jaln keluarnya

Dalam tugas sarjana ini, pembuatan pola dilakukan dengan menggunakan metode program linier sebagai metode alternative yang dapat digunakan perusahaan.. Di dalam program

Jika ditulis dalam lambang bilangan pecahan menjadi (tiap satu bagian kue yang dipotong sama besar). Apabila satu bagian utuh kue tersebut dipotong menjadi delapan bagian