PERFORMANCE CALIBRATION THROUGH PARTLY PEER ASSESSMENT

A module can be assessed mainly in two ways (Biggs, 2003): (i) the traditional model: we teach and then we test, or (ii) we rank students along a quantitative scale usually through ‘marking’, and then allocate grades to different students. The former is mainly about absolute performance; students will get what they get from assessment and may be biased by the assessment contents and marking criteria and practice. For example, the exam questions may be too difficult, the marking criteria may not be appropriate, or the marking itself may be too harsh since the marking staff may have different ideas and beliefs when marks should be awarded. In this case, the absolute performance may not always reflect the true performance and effort of the students, mainly due to the fact that a reference is missing: how do we compare the actual answers from students and an example answer from a teaching staff, and what happens if an example answer is either missing or not comprehensive for the reflection of the knowledge actually delivered by the teaching staff or learned by students. The latter is more about relative performance and rank of students in a class: some statistics of students’ performance will be fixed, irrelevant to the actual performance of the students. It can overcome some shortcomings of the former:

the exam questions may be too challenging and marking may be too harsh. However, it prescribes the distribution of the performance of students which may not reflect the actual capability and commitment of students at all.

In this paper, we report the teaching and assessment of a second-year image processing module.

As peer assessment can act as a learning tool, supporting students to make effective and informative judgments (Bloxham and West, 2007), it was used as a component for the module assessment. In contrast to the aforementioned traditional assessment means, we argue that partly peer assessment may provide a useful alternative component for the calibration of the actual performance of students. Such practice shows a sincere commitment to the encouragement of student autonomy in learning and student responsibility for critical evaluation of their own work (Langan and Wheater, 2003). It is also in line with the guidelines from ENQA (ENQA, 2009) that when possible, the assessment should not rely only on the judgment of a single examiner. Such a method is especially useful for the assessment of modules, where exams are not suitable due to their open-ended and application orientated natures such as programming modules and image processing modules.

2. The teaching of a module

The module is open to the second year university students in the department of computer science. In this case, it is reasonable to assume that the students have a good background in mathematics, and skills and experience in programming. Even so, two lectures and two workshops are still scheduled on the mathematical techniques to be involved and marking criteria and programming for image operation.

Through these, (i) main mathematical techniques such as probability, interpolation, convolution, and similarity are introduced; (ii) the marking criteria are clarified and some abstract terminologies such as the quantitative and qualitative evaluation of techniques are explained; and (iii) the image loading, data manipulation, and image generation in Java programming language are demonstrated. Such demonstration programs can be clearly used as a starting point for students to learn the subsequent topics and attempt their assignment.

The whole module has a nature of problem solving and is delivered mainly through lectures and workshops from the end of September to the beginning of November in each academic year. It covers the following topics: introduction, color space, image formation, image compression, image enhancement, texture, and image classification. The Introduction chapter shows the imaging modalities and image formats, the Color Space chapter discusses how to visualize and interact with the imaged data, the Image Formation chapter covers some principles for the 3D world to be projected onto the 2D image plane and how the projected 3D world can be characterized and represented, the Image Compression chapter discusses techniques for the effective storage of the imaged data, the Image Enhancement chapter introduces and demonstrates various techniques for the enhancement of image contrast, the Texture chapter discusses the techniques for the description of regions (a set of continuous pixels) in a given image, and finally, the Image Classification chapter introduces some techniques for the description and classification of images and how the performance of different techniques can be quantified.

While the module has been run for more than 10 years, its syllabus has been kept relatively stable, but its contents have been changed slightly due to the feedback from students mainly on mathematics and programming.

3. The assessment of a module

The module is assessed through two pieces of work: demonstration within 5 minutes followed by a 5 minute question and answer session, and an essay in the form of a scientific paper. The marking

criteria are listed in Table 1. The rationale for doing a demonstration before the final paper submission is for students to show what they have done and collect feedback from their peers and relevant teaching staff, so that they can learn from and incorporate these feedbacks into the paper they have to submit later, improving their domain-specific skill (van Zundert, Sluijsmans, van Merrienboer, 2010). To clarify what is actually expected from either the demonstration or the paper, a workshop was run, explaining different criteria, giving some warning about what they should pay attention to, and showing them some example papers from the past years and asking them to comment. To facilitate the students to approach the assessment, another workshop was arranged, showing them how to read images into a Java program, how to extract data, how to manipulate data, and how to wrap up the processed data back into an image for display and visualization. The demonstration marks are calculated as the average of the marks from the teaching staff and the average of the marks from students. It is compulsory for the students to attend all the sessions. The absence of one session leads to a reduction of 5% of the total marks of the module.

Table 1. The marking form for the image processing module

60% Quality of paper Maximum Awarded

Project statement 10%

Description of relevant algorithms and techniques 20%

Experimental outline 10%

Quantitative results 15%

Qualitative results 15%

Conclusion 10%

Bibliography 5%

Wow factor 15%

40% Demonstration (To be assessed at the demo to the peer group) 100%

Problem statement 15%

Algorithm summary 15%

Sensible output 40%

Quantitative results 20%

Clarity and coherence of the demonstration 10%

TOTAL 100%

4. Actual assessment of the demonstration

The demonstration is scheduled inside the normal teaching rooms with necessary facilities available at the end of November and the beginning of December, about two or three weeks after the module is delivered. The students are scheduled according to the topics they selected. The earlier the topics covered in the lectures, the earlier the demonstration. In the process of scheduling and demonstration, special circumstances were taken into account. In these cases, the demonstration may be rescheduled, leaving some early topics to be demonstrated later. In each session, at most five students were scheduled and all the students were given the marking sheets, allowing them to write the awarded marks and any comments they may have on the particular demonstrations from their peers. Students were reminded that they should mark and comment seriously, so that they could reflect the actual performance from particular students and these students can learn from such comments when they write the scientific papers. After all the demonstrations have been made, the teaching staff collect the data from all the students and collate and send them to students as feedback within one week since the completion of all demonstration sessions. To help students to learn, a summary of all the comments and the performance of all the students are also made and sent out.

5. Results

In this section, we analyze in detail the performance of students with and without calibration.

The results are presented in Figures 1 and 2 and Tables 2 and 3. Figure 1 shows that the marks given by students are usually higher than those given by the teaching staff (one in this case) with a difference as large as 25 marks. Such an observation has been confirmed by Table 2 where the difference in marks given by students and teaching staff is on average as great as 8.48 in the academic year (AY) of 2014-2015, 14.07 in 2013-2014, and 8.24 in 2012-2013. In general, the difference is almost as large as a grade. These results show that students usually loosely apply the marking criteria to the demonstrations of their peers. Figure 1 (left) shows even an extreme case that one student gave an average of 92.5% to their peers. In this case, the marks given by some individual students may be not reliable. These observations verify those made in (Langan and Wheater, 2003) that students were more generous, awarding 5% higher marks, in (Isaacs, 2001) that many students find it uncomfortable to grade friends or fellow students too

harshly and in (Brown & Knight, 1994) that friendship marking results in over-marking; and ‘decibel marking’ results in the noisiest or most dominant getting the highest marks. These findings have been verified by Table 2, which shows that the marks received from their peers and teaching staff are highly correlated. However, the correlation between the marks received and given by students is low, if there is any correlation at all. This is because the former measures the average performance of different students, while the latter measures the average judgments of performance of different students. This means that the marks received by their peers must be combined together and are relatively reliable and thus can be used as a reference for the calibration of performance of students. Otherwise, these marks are not reliable and informative. Such findings confirm the conclusions made in (Falchikov, Goldfinch, 2000) that overall peer marks agree well with teacher marks, in (Liu and Carless, 2006) that students are reasonably reliable assessors, and in (Langan and Wheater, 2003) that peer–assessment of student presentations for summative purposes is feasible.

Figure 1. The scatter plot of the marks from staff and the average marks from students received in plus signs and giving in circle signs in the academic year of 2014-2015 (left), 2013-2014 (middle) and 2012-2013 (right).

Figure 2. The mark difference before and after calibration of performance for students from academic year of 2014-2015 (left), 2013-2014 (middle) and 2012-2013 (right) respectively

Table 2. The average marks received (AMR), from staff (AMS), and giving (AMG) by a number N of students in different academic years and their correlation coefficient C.

AY N AMR AMS AMG C(AMR,

AMS)

C(AMS, AMG)

C(AMR, AMG)

2014-2015 24 71.93 63.45 72.48 0.52 -0.23 -0.36

2013-2014 16 72.51 58.44 71.03 0.83 0.00 -0.01

2012-2013 23 70.38 62.14 69.77 0.72 0.13 0.10

Table 3. The average and standard deviation of marks of the class with (AW and SDW) and without (AO and SDO) calibration of performance from the peer marking of demonstration in different academic years.

AY AW SDW AO STO

2014-2015 56.52 13.79 54.95 13.57

2013-2014 55.88 14.95 53.20 14.69

2012-2013 56.27 16.84 54.73 16.71

Figure 2 shows the difference of the marks of different students before and after calibration. In some cases, this is as large as 5 marks. This is significant in assessing and reflecting the genuine performance and effort of students. What is more important is that students were treated differently, reflecting the common opinions of other students. While 50% of the marks of demonstration came from the peer marking, this weight can be clearly increased in order to make sure that the average marks of students are around 60% without using any subjective linear scaling, and thus providing more objective methods for the calibration of the performance of students. Table 3 shows the difference in the average and standard deviation of marks of the whole module of the class in different AYs. Interestingly enough, the average performance of the class has been maintained in the range of 55-60% through calibration, even though the number and the background of students vary from one year to another. The performance of students was calibrated through the peer assessment of the demonstration component required for the assessment of the whole module. The final marks of the class were all accepted by the university examination boards and external examiners as a whole without any further adjustment. The students did enjoy and were stimulated by the process of peer marking, learned from each other and found the process rewarding.

6. Conclusion

In this paper, we investigated the issue of the calibration of performance of students. Our research shows that the partly peer assessment is useful, providing at least an alternative component for assessing the performance of students. However, it is necessary to bear in mind that the marks given by some individuals might not be reliable. In sharp contrast, the marks received from different students should be combined together and are then highly correlated with those given by teaching staff. In this case, the peer assessment does provide a useful alternative component for manipulation in the process of the assessment of a module and calibration of performance of students. After integration, the marks from students are usually reliable enough for rectifying the marks of students differently and objectively and thus may avoid the bias from either students themselves or teaching staff alone. This practice is in line with the guidelines from ENQA (ENQA, 2009), that when possible, the assessment should not rely only on the judgment of a single examiner. Such a method will be particularly useful when modules such as programming and image processing are difficult to assess in the form of exams due to their open-ended or problem solving nature. While the marks given by different students were equally treated in this research, they may be treated differently (Langan and Wheater, 2003; Liu & Carless, 2006), so that they are more objective in acting as a reference for the performance calibration of students in the future.

References

Biggs, J. (2003). Aligning teaching and assessing to course objectives. Learning and Teaching Support Centre, https://www.heacademy.ac.uk/sites/default/files/biggs-aligning-teaching-and-ssessment.pdf Bloxham, S. (2008) Assessment in teacher education: stakeholder conflict and resolution. Practitioner

Research in Higher Education, 2(1), 13-21.

Bloxham, S. and West, A. (2007) Learning to write in higher education: students’ perceptions of an intervention in developing understanding of assessment criteria. Teaching in Higher Education, 12(1), 77-89.

Brookhart, S. (1999) The art and science of classroom assessment: the missing part of pedagogy. The George Washington University Press.

Brown, S. & Knight, P. (1994) Assessing learners in Higher education. London, Kogan Page.

European Association for Quality Assurance in High Education (2009). Standards and guidelines for quality assurance in the European Higher education area. 3th ed. Helsinki: European Association for Quality Assurance in Higher Education (ENQA)

Isaacs, G. (2001) Assessment for learning. Brisbane, the University of Queensland.

Falchikov, N., Goldfinch, J. (2000) Student self-assessment in higher education: a meta-analysis. Review of Educational Research, 70(3), 287-322.

Langan, A.M. and Wheater, C.P. (2003) Can students assess students effectively? Some insights intp peer assessment. Learning & Teaching in Action, 2(1).

Liu, N.-F. and Carless, D. (2006) Peer feedback: the learning element of peer assessment. Teaching in Higher Education, 11(3), 279-290.

Nicol, D. (2009) Transforming assessment and feedback: enhancing integration and empowerment in the first year. Mansfield: QAA.

Wosik, D. (2014). Measuring the quality of the assessment process: dealing with grading inconsistency.

Practitioner Research in Higher Education, 8(1), 32-40.

Van Zundert, M. Sluijsmans, D., van Merrienboer J. (2010) Effective peer assessment processes: research findings and future directions. Learning and Instruction, 20, 270-279.

GENDER DIFFERENCES IN THE IMPLEMENTATION OF SCHOOL-BASED

Dalam dokumen END-2015.pdf - UBBG Institutional Repository (Halaman 87-92)