Assessment quality: What is considered quality in assessment?

REVIEW OF RELATED LITERATURE

2.9 Assessment quality: What is considered quality in assessment?

Three aspects are important when we talk of quality in assessment: validity, reliability, and fairness. Validity and reliability are crucial for decision making about the quality of evidence collected in the classroom. According to Thompson (2013), validity and reliability are two essential aspects in evaluating an assessment process, be it examination of knowledge, a psychological inventory, a customer survey, or an aptitude test. Lian, Yew, and Meng (2014) remarked that validity and reliability are essential principles in educational measurement; therefore teachers needs to know, understand and put into practice such conceptual essentials in order to make better assessment decisions on students’ learning and teaching.

Validity is considered an evaluative judgement about the degree to which the assessment results are appropriate for making certain educational inferences and decisions (Messick, 1993, as cited in Lian, Yew & Meng, 2014). Similarly, Bond (2003) posited that validity is the core of any form of assessment that is trustworthy and accurate. Maree (2010) added that validity of classroom assessment refers to the extent to which an assessment measures what it purports to measure. This implies that validity is the extent to which the information collected reflects or measures the attributes of what one wants to know about. Validity of assessment results can be high, medium or low or may range from weak to strong (Gregory, 2000), from which one can conclude that validity cannot be summarised by a numerical value, but can be observed in the assessment task that is administered. Validity of an assessment for learning depends on the extent to which the interpretation and use of assessment actually lead to further learning (Hargreaves, 2007). That suggests that in FA validity can be measured by the extent of improvement in learning.

Hamidi (2010) classified validity into three types: content validity, consequential validity and ipsative validity. Content validity is the extent to which the items in a test represent the domain to be measured (Salvia, Ysseldyke, & Witmer, 2012). According to Hamidi (2010), content validity is the correspondence between curriculum objectives and the objectives being assessed. Hence, if a test is to be used for making instructional decisions, then it is important that there is alignment between the test and the specific instructional or curricular areas that the test is meant to cover.

The second type of validity noted by Hamidi (2010) is consequential validity. According to Tiekstra, Minnaert, and Hessels (2016), consequential validity gives credence to the way in which

assessment influences learning and teaching during the testing procedure. These authors regard consequential validity as aiding teachers to focus on classroom activities which support students’

learning and are responsive to individual needs. Similarly, Shepard (1997), as cited in Hubley and Zumbo (2011), expanded on the work of Messick (1993) on consequential validity, and argued that this type of validity should include both the positive and negative social consequences of a test. Shepard noted that positive attributes of consequential validity include improved students’

learning and motivation, and ensuring that all students have access to equal classroom content.

The author also argues that a standardised test also has several negative consequences; notable among them are its use to reallocate state funds, and teaching students to pass the test instead of having conceptual understanding of the material. In contrast, ipsative validity, according to Hamidi (2010), takes into account students’ performance, which is assessed formatively during class interaction by teachers and not by making use of their past performance as a criterion for judging their learning abilities. This form of validity places the students at the centre-stage of the assessment activity, and therefore provides diagnostic information on the progress of the individual (Lines, 2000). FA operates at the decision level of discourse and it issues a statement, not on the interpretations but on the consequences of decisions. Therefore, the validity of FA is related more to the earlier definition of test validity: that it does what it purports to do, which is improve learning.

The extent to which test scores are free from measurement error is referred to as reliability (Muijs, 2011). Thompson (2013) affirmed that reliability indicates the degree to which test scores are stable – or reproducible – and free from measurement error; this refers to the consistency of assessment scores (Moskal & Leydens, 2000). According to McClure, Sonak, and Suen (1999) reliability is an expression of the proportion of the variation among scores that are due to the object of measurement. An assessment is reliable when there is a finite distinction in students’ scores or in judges’ ratings across different occasions and by different judges (Brindley, 2003).

According to Moskal and Leydens (2000) two forms of reliability are considered in classroom assessment: 1) inter-rater reliability, and 2) intra-rater reliability:

Rater reliability generally refers to the consistency of scores that are assigned by two independent raters and that are assigned by the same rater at different points in time. The

former is referred to as interrater reliability while the latter is referred to as the intrarater reliability.

The more the scores of assessment are consistent over different raters and times, the more reliable the assessment is thought to be (Moskal & Leydens, 2000). According to Brown, Bull, and Pendlebury (1997) the “major threat to reliability is the lack of consistency of the individual marker”. Reliability is necessary, but is not a sufficient condition for a valid measurement. In other words, all valid tests are reliable and unreliable tests are not valid, while reliable tests may or may not be valid. In conclusion, it can be said that decisions based on assessment results could be trusted and defensible if the assessment is reliable.

The most important challenge in assessment is the issue of fairness (Kunnan, 2005). A fair assessment takes into consideration issues of access, equity and diversity. Lynch (2001) defined fairness of assessment as treating all students equally and giving everyone an equal opportunity to demonstrate their ability. Messick (1994) mooted that issues of fairness are at the heart of performance assessment validity. A fair and just assessment task provides all students with an equal opportunity to demonstrate the extent of their learning. According to Kunnan (2000) one of the best procedures to attain fairness in a test is where test writers are from different groups and are trained to explore all aspects of a test for its fairness. Fairness in assessment is fundamentally a sociocultural rather than a technical issue, and fair assessment cannot be considered in isolation from the curriculum and the educational opportunities of the students (Stobart, 2005). Tierney (2016), in support of Kunnan (2005), posits that fairness in assessment is complex and cannot be ensured through one’s practice. In Hamidi (2010), four problems associated with assessment fairness were identified: 1) the performance called for on authentic assessment forms is often highly language-dependent, either oral or written; 2) the responses called for in performance assessment involve complex thinking skills; 3) authentic assessments are often used to measure students’ in-depth knowledge in an area; and 4) the use of authentic assessment might worsen the problem with culturally unfamiliar content. If the content related to the subject matter is strange to them, the student may not be able to answer the questions contained in the assessment.

Despite the challenges regarding fairness in assessment, efforts can be made to achieve such fairness if certain conditions and strategies are put in place, depending on the purposes of the

assessment and the individual assessed. According to Tierney (2016, p. 8) “to achieve fairer in assessment, conditions and strategies for fairness should be considered proactively in the design and development of assessment tools and tasks, continually through assessment interaction and retrospectively in reviewing the assessment process”. This author also identified three conditions for fairer assessment. The first one is the opportunity to learn. According Tierney (2016) the opportunity to learn is a self-defining term that can vary considerably in breadth. It simply means exposure to test content or alignment between curriculum and assessment. The second condition necessary for achieving fairness in assessment is a constructive environment, which respectfully motivates students to take part and disclose their knowledge and learning through assessment (Tierney, 2016). Finally, fairness in assessment requires evaluative thinking, which involves asking questions, identifying assumptions, seeking evidence, considering explanations and critically evaluating assessment practices.

The issue of quality in assessment seems to be seen more in written assessment tasks, which can mislead one into thinking that it only applies to assessment of learning, while any form of assessment needs to ensure quality in order to improve learning. FA is crucial in the improvement of learning; therefore, ensuring quality in FA is paramount. In addition to what various authors posit as what entails quality in assessment, I posit that transparency in the assessment practices is of the utmost importance, and this aspect of quality in assessment can be more evident in FA. This is because FA is not meant for progression. For example, by using self-assessment a student can know in advance what they need to assess themselves on in order to report and reflect on their progress, which can be done by (among others) using journal reflections.

Dalam dokumen An exploration of mathematics teacher educators’ understanding and practices of formative assessment: a case of three colleges in Ghana. (Halaman 80-83)