What the Current Literature Tells Us about Materials Evaluation

Establishing Criteria and Developing Evaluation Instruments

Since the turn of the century there has been a move away from the presentation of checklists for the reader to use (though to our regret we ﬁnd that many postgraduate students still prefer to use published checklists rather than generate their own context relevant criteria). We have often been asked to publish evaluation checklists but we have always declined because we believe that no checklist can ever be transferable from one

3 Materials Evaluation 

evaluation context to another, that any checklist inevitably reﬂects the pedagogic beliefs of its designer(s), and that a published checklist is inevitably invested with an authority it might not deserve but which might attract teachers and students to use it uncritically and inappropriately in their evaluations.

Tomlinson (2003) prefers to outline a process for generating principled criteria rather than present a ready-made but unrealistic set of criteria for all contexts. He is insistent that evaluators need to develop their own criteria which take into account the context of their evaluation and the beliefs of all the evaluators. He describes, justifies, and exemplifies ways of developing principled criteria and risks annoying postgraduate and other evaluators by not providing checklists for others to use. He also echoes Candlin and Breen (1980) by advocating that evaluation criteria should be developed before materials are produced and that these criteria should be used to make decisions about the approach, procedures and activities and to evaluate them whilst they are being developed, after they have been developed and after they have been trialed. We have always followed this procedure on textbook projects we have been involved in (e.g. in China, Ethiopia, Singapore and sub-Saharan Africa) but often the final project is evaluated by

“experts” ﬂown in by the sponsor who then use their own impressions or checklist to evaluate the materials.

Tomlinson (2003) makes what for him is a significant distinction between universal criteria and local criteria. He defines universal criteria as criteria that can be used to evaluate any materials for any learner anywhere. To develop these criteria he suggests that evaluators should generate a list of beliefs that they hold about how second or foreign languages are most effectively acquired (based on their reading, research and experience) and then to transform these beliefs into criteria for evaluating materials—

for example, “I believe that learners need to be affectively engaged” becomes “Are the materials likely to achieve affective engagement?” In contrast to universal criteria, local criteria are those which are specific to the context in which the materials are being (or are going to be) used and that they are best developed by first specifying a profile of the target context. He also recommends a procedure for generating evaluation criteria to be used for the development, the ongoing monitoring and the eventual evaluation of materials (pp. 27−33)—a procedure that was used in Tomlinson et al. (2001) and later in Masuhara et al. (2008) and Tomlinson and Masuhara (2013) for evaluating coursebooks, as well as on a number of materials development projects led by Leeds Metropolitan University in, for example, China, Ethiopia and Singapore. We will describe this procedure in detail later in this chapter when giving our personal recommendations for evaluation.

Tomlinson (2003, p. 16) says (as we stressed earlier in this chapter) that evaluation is inevitably subjective, that it “focuses on the users of the materials” and that it attempts to measure the potential or actual effects of the materials on their users. In contrast, analysis focuses on the materials and aims to discover what they contain, what they ask learners and teachers to do and what they say their objectives are. He makes the point that materials analysis attempts to provide an objective account of materials but the ana- lyst’s choice of questions to ask is usually subjective and there is often a hidden agenda that it is hoped the resultant data will support. Littlejohn (2011, p. 181) makes a similar distinction when he says that analysis is concerned with materials “as they are” and “with the content and ways of working that they propose,” and that analysis is not concerned with “how effective materials may be in achieving their aims.” He also says that it is useful to do an analysis of a set of materials first so as to discover the extent of their match with

 Materials Development for Language Learning

the target context of use and then, if there is sufficient match, to do an evaluation in order to predict the likely effects of the materials on their intended users. Byrd (2001) makes a very different distinction between evaluation and analysis of textbooks when she talks about evaluation for selection of materials and analysis for their implementation. As we have said above, the literature often confuses materials analysis with materials evaluation and uses the terms as though they are interchangeable. For example, Mariani (1983, pp. 28−29) includes in a section on “Evaluate your coursebook” such analysis questions as “Are there any teacher’s notes?” and Cunningsworth (1984, pp. 74−79) includes both analysis and evaluation questions in his “Checklist of Evaluation Criteria.”

In recent years a number of other writers have put forward principled frameworks for materials evaluators to make use of rather than providing checklists for them to follow.

McGrath (2002, p. 31) distinguishes between “general criteria (i.e. the essential features of any good teaching-learning material)” and “specific (or context related) criteria,” a distinction similar to the one made by Tomlinson (2003) between universal and local criteria. For coursebook selection McGrath outlines a procedure that includes the following sequential stages: materials analysis, first-glance evaluation, user feedback, close analysis, and evaluation with situation-specific checklists and then selection. McDonough and Shaw (2003, p. 61) suggest an approach in which the evaluators first conduct an external evaluation “that offers a brief overview from the outside” and then carry out

“a closer and more detailed internal evaluation.” They give practical advice on how to conduct both types of evaluation and discuss factors to consider when developing criteria. They also stress that the four main considerations when deciding on the suitabil- ity of materials are usability, generalizability, adaptability, and flexibility. McDonough, Shaw, and Masuhara (2013) update and develop this practical advice further. Riazi (2003) provides a useful critical survey of textbook evaluation schemes from 1975 onwards in which he points out how ephemeral many of the criteria are because they were based on pedagogic approaches that were favored at the time of publication. In his conclusion he wisely supports Cunningsworth (1995) in insisting on the importance of collecting data about the context of learning before starting any evaluation and he outlines a procedure that includes a survey of the teaching / learning situation, a neutral analysis (we wonder if this is actually possible), a belief-driven evaluation, and then the selection. Other writers who have offered principled advice on how to develop evaluation criteria include Wallace (1998), who suggests 12 “criterion areas” for materials evaluation, and Rubdy (2003), who proposes and exemplifies an interactive model of evaluation combining psy- chological validity, pedagogical validity and process / content validity. Tomlinson and Masuhara (2004) propose a principled evaluation procedure for inexperienced, unqual- ified teachers, a procedure that we will outline in detail in our final section of this chapter and which has been reproduced in Korean and Portuguese translations of the book and in a version published in China (Tomlinson & Masuhara, 2007). McCullagh (2010) also reports on a principled procedure that she used to evaluate materials developed for use with non-native speaker medical practitioners.

Evaluation and selection of learning materials: A guide.(2008) is an unusual publication. This is a booklet speciﬁcally published for language teachers in Prince Edward Island, Canada instructing them how to evaluate and select from the learning materials available to them. It is unusual in that it is very well-informed, very principled, very thor- ough, very coherent but very prescriptive in its insistence that “The overall goal must be to support the learning outcomes of the curriculum. The consideration of curriculum ﬁt must be applied rigorously to all mediums of presentation” (p. 1).

3 Materials Evaluation 

Mukundan and Ahour (2010) very usefully review 48 evaluation checklists from 1970 to 2008 and are critical of most of them for being “too demanding of time and expertise to be useful to teachers, too vague to be answerable, too context bound to be general- izable, too confusing to be useable and too lacking in validity to be useful” (Tomlinson, 2012, p. 148). They assert that a framework for generating flexible criteria would be more useful than detailed and inflexible checklists and also that more attention should be given to retrospective evaluation than to predictive evaluation in order to help teachers to evaluate the effects of the materials they have used so as to be able to make informed modifications the next time they use them. This is a point which is also made by Tom- linson (2003, 2013b) and by Ellis (2011), and which we constantly make to teachers, to project sponsors and to publishers. Mukundan and Ahour (2010) also advocate what they call a “composite framework” for evaluation consisting of multiple components and including computer analysis of the script of the materials (focusing in the examples they give on vocabulary load or on recycling). Mukundan has campaigned against the exclusive use of predetermined checklists for many years and in Mukundan (2006) he describes the use of a composite framework, which combines the use of checklists, of reflective journals and of concordance software, to evaluate locally published ELT textbooks in Malaysia.

In recent years, publications have focused much less on materials evaluation and much more on principled ways of developing and using materials. Mishan and Chambers (2010), for example, contains no chapters on materials evaluation and nei- ther does Harwood (2010). Both books focus very much on principles and procedures for developing materials for diﬀerent purposes and for diﬀerent types of learners.

Tomlinson (2011), however, does contain a section on materials evaluation. In it, Little- john (2011) updates, develops, and exempliﬁes his inﬂuential (1998) framework for ana- lyzing materials and adds sections on postanalysis evaluation and use of materials. One of the points he stresses is how empowering it can be for teachers to use his framework.

Whilst acknowledging that Littlejohn’s framework has been very useful for researchers and postgraduate students and is frequently cited in the literature it does seem to us to be rather demanding of time and expertise to be a practical tool for busy teachers.

In Tomlinson (2011), Ellis (2011) reviews principles and procedures for researcher- led macro-evaluations of task-based approaches and then focuses on practical procedures for teacher-led micro-evaluations of task-based approaches that provide teachers with useful information, contribute to teacher development and can inform macro-evaluations. In another contribution to Tomlinson (2011) Masuhara (2011) updates and develops her often-cited (1998) chapter on what teachers really want from coursebooks. She investigates teacher needs and teacher wants. Then she suggests ways in which teachers can improve their evaluation and development of published materials as well as their own ability to use materials in the ways they think are most eﬀective for their students, whilst empowering themselves and contributing to their own teacher development. She also reports improvements she has noticed since 1998 in the inclusion of teachers in the process of evaluating and developing materials. In the ﬁnal chapter of the evaluation section of Tomlinson (2011), Amrani (2011) reports on current practice in publisher evaluation of materials. She reveals that publishers rarely subject all their materials to time-consuming piloting nowadays (see Roxburgh, 1997, and Singapore Wala, 2013a, for their views on the importance of piloting) and instead they use selective piloting of sections; they have the materials predictively evaluated by focus groups and they use reviewers, questionnaires, panels of experts, and editorial visits. She also reveals

 Materials Development for Language Learning

that development times for a course have been cut down from 7 years to 3 years since Donovan (1998) wrote about the processes of publisher evaluation, and that publishers evaluate materials primarily in order to identify customer expectations, to check the match of a speciﬁc section with its objectives and to “see how the scope and sequencing work in terms of a fuller syllabus” (Amrani, 2011, p. 273). Amrani predicts that post-use evaluation will increasingly inform future materials development and that “evaluation will become less of a clear cut stage prior to publication and be more of an ongoing process where materials are reﬁned and even changed throughout the life of a product”

(p. 295). We hope her predictions come true and we applaud the apparent thoroughness of the evaluation procedures she reports. However, we are a little concerned that much of the current publisher evaluation depends on the views of teachers and “experts” will- ing to contribute, that so little of it seems to be of the actual eﬀects of materials in and after use, and that none of it seems to include feedback from learners.

McGrath (2013) makes similar points to ours above by stressing the importance of in- use and post-use evaluation and of the inclusion of learners in the evaluation process. In a very useful section on How Teachers Evaluate Coursebooks (pp. 106–126), McGrath reports studies of how coursebooks are evaluated and selected. Many of the studies report how teachers think they should be responsible for the evaluation and selection of coursebooks but that in most cases (even if teachers are consulted) the decisions are taken by administrators or heads of department—sometimes influenced by publishers’ offers. However, McGrath reports a very encouraging study of a Swedish secondary school (Fredriksson & Olsen, 2006), which narrates a progression from teachers being influenced by another school and by an author in their selection of textbooks to teachers piloting a book in two classes before evaluating it and making a decision. He also reports a study in Taiwan (Wang, 2005) in which teachers who had been given a checklist to use in selecting their own textbook reflected on the process. In McGrath (2013) there is a subsection (p. 117–125) that provides a critical review of studies of teachers’

own criteria. The actual criteria seem to depend on contextual circumstances and range from very teacher-centered criteria such as survival from inexperienced, overworked teachers to more learner-centered criteria such as student motivation from more expe- rienced and conﬁdent teachers. McGrath (2013) looked for evidence of in-use and post-use teacher evaluation of materials but, disappointingly, only found two: Law’s (1995) study of Hong Kong teachers and Fredriksson and Olsen’s (2006) study in Sweden. However there are studies of teacher in-use and post-use evaluation of materials in Tomlinson and Masuhara (2010)—Al-Busaidi and Tindall (2010), Pryor (2010), Stillwell, McMillan, Gillies, and Waller (2010), and Stillwell, Kidd et al. (2010). For information about these studies see our section on “Reporting Evaluations” below.

McDonough, Shaw, and Masuhara (2013) contains a section on evaluation that accepts that there is no “agreed set of procedures or criteria for evaluation” (p. 52) but puts forward a “model for hard-pressed teachers or course planners that will be brief, practical to use and yet comprehensive in its coverage of criteria” (p. 52). This model is in two stages: (1) “an external evaluation that oﬀers a brief overview from the outside (cover, introduction, table of contents)” (p. 53) which is used to eliminate materials which do not match the content and approach needed for the target learners; (2) “a closer and more detailed evaluation” (p. 53) of materials that have been found in stage (1) to match the requirements of the target context of learning. After stage (2), decisions are made about the usability, generalizability, adaptability, and ﬂexibility of the materials.

3 Materials Evaluation 

In Tomlinson (2013a) there are two chapters on evaluation. Tomlinson (2013b) looks in detail at the principles that can inform materials development and considers, in par- ticular, principles deriving from the evaluators’ theories of learning and teaching, from learning theory, from SLA research, and from his own experience and research. He then says what he thinks can be predicted in pre-use evaluation, can be observed in whilst-use evaluation, and can be measured in post-use observation as well as suggesting eﬀective ways of doing so. He also reviews standard approaches to materials development and suggests the following criteria for evaluating evaluation checklists and lists of criteria:

r

Is the list based on a coherent set of principles of language learning?

r

Are all the criteria actually evaluation criteria or are they criteria for analysis?

r

Are the criteria suﬃcient to help the evaluator to reach useful conclusions?

r

Are the criteria organized systematically (for example, into categories and subcate- gories that facilitate discrete as well as global verdicts and decisions)?

r

Are the criteria suﬃciently neutral to allow evaluators with diﬀerent ideologies to make use of them?

r

Is the list sufficiently flexible to allow it be made use of by different evaluators in different circumstances? (Tomlinson, 2013b, p. 36)

Finally Tomlinson proposes a procedure to use in “major” evaluations and a reduced version of it for use in less formal evaluations where saving time is important. These procedures involve the evaluators developing their own universal criteria by brainstorming their beliefs about what facilitates language acquisition and development and their own local criteria by proﬁling the target context of learning. We will provide more details about these procedures in the ﬁnal section of this chapter.

The second chapter on evaluation in Tomlinson (2013b) focuses on the role of feedback in the process of developing and publishing a coursebook. In it, Singapore Wala (2013) reviews the literature on such feedback (and especially on the role of teachers in providing feedback) and considers the issue of whose feedback to include, so as to satisfy all the stakeholders in the process. She then focuses on the case of Singapore and details the various feedback processes that were built into the development of a coursebook to match the requirements of a new Ministry of Education syllabus. These included feedback from teachers about existing materials, feedback from teachers who trialed sample units in target schools, feedback from the Curriculum Development Planning Division and feedback from various “feedback loops” (p. 82) that had been incorporated in the materials-development process. In her conclusion, Singapore Wala focuses on the ten- sion between feedback from curriculum developers who work with ideals and abstracts and are located in the future and feedback from teachers who work with students in classrooms and are located in the present.

Tomlinson (2013c) focuses on the application of applied linguistics research and theory to the development of materials. None of the chapters focus on evaluation directly but Tomlinson (2013d) develops a number of principles for facilitating language acquisition and then makes use of 10 of them as criteria for evaluating six current global coursebooks. He found, for example, that all six books achieved low ratings for aﬀec- tive engagement, utilization of the resources of the brain, opportunities for language use, catering for the individual and focus on meaning, and that none of them made any use of nonlinguistic communication. Five of these six coursebooks (plus one other) are evaluated in Tomlinson (2013e) against ﬁndings from classroom research and the conclusion is that there is a very weak match between the six books and “what we

Dalam dokumen The Complete Guide to Materials Development for Language Learning (Halaman 70-80)