The ‘backwash’ of reform
Terry Salinger
Introduction
Critics of testing practices abound in the USA and become especially vocal when they discuss tests and assessments used in the early childhood grades.
Many early childhood educators and assessment experts contend that too much testing is done and too many erroneous decisions about young learners are made because of test data that have been gathered with inappropriate, faulty, culturally insensitive, or poorly administered instruments (see Young Children, July, 1993; Meisels, 1985, 1987; NAEYC, 1988; Pearson and Stallman, 1994). Other critics point out that even in early childhood classes, tests can actually determine instruction through teachers’ tendency to ‘teach to the tests’ their students will take, whether they are tests to determine how ‘ready’
they are for instruction or whether or not they ‘qualify’ for ‘special’ enrichment or remedial services (Brandt, 1989; Koretz, 1988; Shephard and Smith, 1990).
To remedy the problems, some critics argue that no child should be tested with standardized instruments prior to at least fourth grade. Others suggest that teacher observation and analysis of student work should be used to keep data from traditional tests in their proper perspective. In addition to moderating the incorrect inferences that can often be drawn from standardized test data, this alternative would afford appropriate levels of credibility to classroom teachers as reliable assessors of young children’s learning (Hills, 1993; Pearson and Valencia, 1987). These two points—the accuracy of inferences made about students and teachers’ credibility as assessors—are critically linked at all levels of schooling but perhaps nowhere more intimately than during children’s first
few years in school. Understanding children’s learning during these initial years in school requires high levels of inference on the part of teachers; they must be able to observe and understand students’ behaviours on several levels. Essentially, they must bring to bear on their observations and subsequent decision-making their knowledge of child development, the structure of the disciplines students are striving to learn and the pedagogic options that will best facilitate individual students’ intellectual and emotional growth. Well-trained, knowledgeable teachers keep all these data sources in their heads, and apply them as needed in thousands of interactions with children and decisions made each day. It is by no means inappropriate to expect that when teachers are supported in collating and externalizing their understandings about students’ learning, the results will provide useful and reliable assessment data (Chittenden, 1991).
From both a policy and a practical perspective, assessment and testing can readily be seen as aspects of a dynamic system within schools. Realization of this systemic effect has played a central role in the movement away from dependence upon external testing instruments and toward ‘alternative’,
‘authentic’, ‘performance-based’, or ‘authentic’ assessment, as charted in many of the chapters within this book (see also Stiggins, 1994; Valencia, Hiebert, and Afflerbach, 1994). These classroom-based forms of assessment place teachers and students firmly in the centre of the assessment calculus; they capitalize on the work that students actually do every day and provide direct indicators of performance.
Especially in the language arts, development of these newer assessments often represents attempts to align a student-centred curriculum and the means by which student progress is measured. Sheingold and Frederiksen (1995) suggest, ‘As with tasks or activities we carry out in the real world, performance assessment…emphasize[s] extended activities that allow for multiple approaches, as well as a range of acceptable products and results’ (2). Unlike traditional multiple-choice testing, which is conducted on a single occasion, newer assessment procedures imply an ongoing process that results in more useful and more varied information about students. The ultimate purpose of performance assessments is to present a cumulative, rich portrait of learners’
strengths, weaknesses and capabilities, thereby enabling teachers to help each student learn more effectively.
Performance assessments can take many forms, including complex tasks students carry out over several days and collections of work samples that become tangible artifacts to document students’ learning. No matter the form, assessment methodologies are contextualized within the fabric of the classroom.
For example, students may read an unfamiliar book aloud, while their teacher records deviations from text and notes apparent strategic reading behaviour, producing data that will be analysed later and will contribute to instructional decisions about students. Students may also keep portfolios of work collected over time that are analysed according to specific rubrics or guidelines.
Interest in performance assessment is widespread in the USA, and many teachers have adopted alternative assessments for classroom use. Often, projects to develop primary portfolios or performance assessment tasks parallel attempts to enhance instruction and to involve teachers actively in instructional and evaluation decision-making processes. While definitions and interpretations of alternative assessment methods differ from location to location, the central purposes seem to be to support instruction and to align assessment with the curriculum.
Few school districts, however, have actually undertaken the massive job of moving toward systemic use of alternatives to standardized, multiple-choice tests; they remain instead at the exploratory stage, trying to determine how and if to proceed. There is much to learn from districts that have worked toward wider assessment reform (Gomez, Graue and Bloch, 1991;
Lamme and Hysmith, 1991; Valencia, Hiebert and Afflerbach, 1994). One such district, whose efforts are discussed in this chapter, has made tremendous strides in reforming its assessment practices. Almost ten years ago, the district began what has become a massive reform effort.
The development and implementation of an early literacy portfolio in the district’s early childhood programme has been the linchpin of change, ultimately motivating new assessments in upper grades and in mathematics (Mitchell, 1992). The portfolio has emerged slowly, the result of several years’ work. In this way, it is not unlike portfolio assessment approaches introduced into numerous districts; but this portfolio programme differs from many such efforts in that its development has been grounded in current research and theory and its use has been investigated empirically. Because the district as a whole had adopted an attitude of reform toward its assessment practices, it has been an excellent locale in which to investigate the long-term effects of change.
This chapter begins by describing the district and its reform efforts and then discusses one part of the empirical investigation of uses of the early literacy portfolio. Specifically, it discusses an attempt to identify any changes in instructional practice and in teachers’ attitudes toward teaching, learning and early literacy content that have resulted from using the portfolio. In the USA, this effect is often referred to as the ‘consequential validity of assessment’, the changes that result as a consequence of new assessment models. In the UK, the term used for this phenomenon is ‘backwash’, a rather nice way of thinking about the effects of reform.
The district
The district is relatively typical of the Mid-Atlantic region of the USA. It is medium in size, with a population that is ethnically, racially and economically diverse. Total enrolment is approximately 4600 students. Many children qualify