International English language testing: a critical response

(1)

point and counterpoint

International English language

testing: a critical response

Graham Hall

Uysal’s article provides a research agenda forIE LT Sand lists numerous issues concerning the test’s reliability and validity. She asks useful questions, but her analysis ignores the uncertainties inherent in all language test development and the wider social and political context of international high-stakes language testing. In this response, I suggest there is ample evidence that, in the normal course of its test development and review processes,IE LT Sis aware of and addressing problematic issues in its testing as they arise. However, I also argue that to address some of the issues arising from Uysal’s discussion, we need to take a broader perspective and examine the social, economic, and political dimensions of international high-stakes English language testing.

Introduction Language testing is an uncertain and approximate business at the best of times, even if, to the outsider, this may be camouflaged by its impressive, even daunting, technical (and technological) trappings, not to mention the authority of the institutions whose goals tests serve. Every test is vulnerable to good questions. (McNamara 2000: 85–6)

Tests are inevitably political since what they do. . .is to sort and select to meet society’s needs. Testers cannot expect that their work will not have a political dimension. The proper reaction to such concern is surely to act with professional skill and rectitude within the contexts in which they work. (Davies 2003: 361)

Assessing L2 writing presents test writers and testing organizations such as CambridgeE S O L, who co-manage theI E LT Stest, with challenges to which there are no easy answers (Hamp-Lyons 1990). And yet, given the

popularity and importance ofI ELT Sin people’s lives, ‘it is becoming increasingly important for CambridgeE S O Lto be able to provide evidence of quality control in the form of assessment reliability and validity to the outside world’ (Shaw 2007: 14).

Hacer Hande Uysal’s review of theI E LT Swriting test draws attention to a series of concerns and suggests thatI ELT Sshould undertake much more research in its efforts to improve the reliability and validity of the test. As the title of her article points out, what she presents is a ‘critical review’, ‘critical’ here seeming to mean ‘given to judging; given to adverse or unfavourable criticism and fault-finding’ (Oxford English Dictionary online) rather than the more balanced ‘involving or exercising careful judgement or observation’

E LT Journal Volume 64/3 July 2010; doi:10.1093/elt/ccp054 321

(2)

(Oxford English Dictionary online), or indeed ‘critical’ as tied to ideas of societal (and educational) transformation and the illumination of power relations.

Thus, my reaction to Uysal’s article was mixed. As an English language teacher and as a test user ofI E LT Sscores at a receiving institution to assist with admissions on to a British university programme, I am a stakeholder in the test. The concerns Uysal raises surrounding test validity and reliability clearly matter, and her article also clearly acknowledges potential difficulties for test takers who are often little heard.

However, the paper was also frustrating. Whilst it is appropriate to call forI ELT Sto be aware of and attend to key issues, the evidence strongly suggests that it is and does (indeed, Uysal drew largely onI E LT S’ own research throughout her article). Furthermore, as Uysal points out, ‘there is no perfect test that is valid for all purposes and uses; test users also have the responsibility to make their own research efforts to make sure that the test is appropriate for their own institutional or contextual needs’. Thus, the concerns raised are those faced by all similar international language tests (for exampleT O E F L, Cambridge certificate exams, TOEIC). The overly critical tone of the paper, focusing only onI E LT S rather than the wider context of language tests and testing, does Uysal’s argument a disservice. The value and interest of many of her points are diminished by a feeling that she is almost deliberately failing to recognize the difficulties inherent in test writing in our complex social and political world.

My contention, then, is thatI E LT Sis something of an easy target, perhaps even a scapegoat, for individual and institutional difficulties within English-medium Higher Education (HE) and relatedE LT/E A Pprovision, whilst more significant and difficult issues concerning the social and political character of high-stakes, international language testing, and the nature of power relations in the contemporary, globalized academic world are left unexamined. This broader context needs to be reintroduced if the challenges facing all stakeholders in English-medium HE are to be fully understood, not only by English language test designers but also students, English language teachers, test users (i.e. receiving institutions), and policy makers.

It is absolutely legitimate for Uysal to raise key questions in order that all stakeholders inI E LT Sthink deeply about test design and what test scores actually mean; however, it is not my intention, nor is there enough space in this article, to deal individually with every issue she raises (although I will deal with several in the course of the discussion). Rather, I aim to highlight the way the case is presented, which seems unduly critical ofI E LT Swhilst failing to examine the broader context of language testing and test development generally and power relations within internationalized English-medium HE.

Examining reliability: the example ofIELT S

marking processes

Uysal’s criticisms of theI E LT Swriting test can be broadly summarized as follows:

(3)

n validity issues including the definition and understanding of Englishes in the world and contrastive rhetorical conventions.

Within her discussion of reliability, a typical point made by Uysal concerns the absence of multiple markers on theI E LT Stest, quoting Hamp-Lyons (1990: 79) in support of her case—‘all reputable writing assessment programmes use more than one reader to judge essays’. What Uysal omits from her argument, however, is Hamp-Lyons’ subsequent discussion. Quoting Alderson, Hamp-Lyons notes thatIELTShas only one reader due to the difficulty of finding qualified readers in British Council locations around the world and the demand for immediate reporting of results. However, more interestingly, she goes on to question the rationale generally given for multiple scoring—that ‘multiple judgements lead to a final score which is closer to a ‘true’ score than any single judgement’ (Hamp-Lyons 1990: 79). Hamp-Lyons suggests that when two readers reach very different judgements, the two scores are averaged, with the final score bearing little resemblance to the actual scores assigned. Or, where a third reader is brought in, one of the original reader’s scores will be discounted, which may seem unproblematic until one realizes that the discounted score comes from a trained reader whose scores for other scripts are being treated as valid. McNamara (2000) further complicates this picture by noting that ‘rating remains intractably subjective’ (p. 37) and observing that ‘a score is not a score is not a score’ (p. 55), i.e. even when two raters give the same score, this might not mean the same thing to each.

This is not to suggest that multiple scoring is necessarily inadequate or that single scoring is unproblematic and should not be questioned. Rather, it is to point out that, when it comes to language testing, the issues are more complex than Uysal portrays in her argument.

Despite this complexity, and to avoid charges of complacency, it is worth examining howI ELT Stries to ensure its single-rater system is as effective as possible. Blackhurst (2004: 18) summarizes the process as follows:

Reliability of rating is assured though the face-to-face training and certification of examiners and all examiners must undergo a re-training and re-certification process every two years. Continuous monitoring of the reliability ofI E LT SWriting and Speaking assessment is achieved through a sample monitoring process. Selected centres worldwide are required to provide a representative sample of examiners’ marked tapes and scripts such that all examiners working at a centre over a given period are represented. Tapes and scripts are then second-marked by a team of I ELT SSenior Examiners. Senior Examiners monitor for quality of both test conduct and rating, and feedback is returned to each centre. Analysis of the paired Senior Examiner-Examiner ratings from the sample monitoring data in 2003 produced an average correlation of .91 for the Writing module.

(4)

IE LT Sappears to be as rigorous as possible given the global context within which the test is designed, delivered, and marked.

Further examples of Uysal’s overly critical approach to test design can be found elsewhere in her article. For example, she writes:

Although there have been several publications evaluating [international high-stakes language] tests in general, these publications often do not offer detailed information about specifically the writing components of these tests. Scholars,on the other hand_{, acknowledge that writing is}

a very complex and difficult skill both to be learned and to be assessed, and it is central to academic success especially at university level. (My emphasis)

Narrowing this claim to the context of theI E LT Swriting test, which was the specific focus of Uysal’s article, two issues can be identified. Firstly, that there is little detailed information about the writing test. However, Falvey and Shaw (2006), Banerjee, Franceschina, and Smith (2007), and Shaw (2007), to list but a few, all deal explicitly with the written element of the IE LT Stest—how the test was developed, what language is expected for differing indicators, and how raters score tests. Secondly, Uysal seems to be suggesting that international language tests (includingI ELT S) do not recognize the complexity of writing (but scholars do) and is implicitly separating the development of the tests from scholars. However, surveying the academic literature about testing in general andI E LT Sin particular, it seems that the two are inseparable and that Uysal is creating a somewhat artificial division betweenI E LT Sand research in order to suggest that the IE LT Stest is somehow under-researched. The annualResearch Reportsand the online quarterlyResearch Notesproduced byI ELT S, authored by researchers into test writing who are also test designers, suggest this is not the case.

Throughout her article, then, Uysal usefully draws our attention to the problems and difficulties of test design. However, where I differ with her is the implicit suggestion that these are issues of particular relevance toI E LT S rather than all testing bodies and thatIELTSfails to recognize these difficulties and act upon them. This seems overly critical and fails to recognize the complexities and uncertainties inherent in language test design.

Validity issues: broadening the debate

Uysal’s discussion of the validity of theI E LT Swriting test opens a door on the wider context of international high-stakes English language testing and test design and its place in globalized, market-driven HE. While it is possible to identify with several of her points, the analysis falls short of recognizing the social, political, and economic forces which underpin her argument.

The nature of language test design

(5)

and will remain ignored and unresolved by test designers. In fact, test design is a dynamic and ongoing process in which testing organizations, includingI E LT S, continually engage to revise and refine a test’s ‘fitness for purpose’. Indeed, inI ELT S’ own research concerning test design and revision, Falvey and Shaw (2006: 8) note:

The project can never be described as a one-shot effort. The revision of a high-stakes examination should never be approached by means of a monolithic exercise without the opportunity to go back, to seek further insights and to be willing to adapt during the process by revising, amending or rejecting previous decisions in the light of further data analysis.

An international test of English. . .a test of international

English?

A key element of Uysal’s discussion is the development ofIE LT Sas an international English test. I strongly sympathize with her argument which acknowledges and values ‘English as an International language (EIL) varieties’ (Jenkins 2006) and identifies the difficulties speakers of such Englishes face in terms of language testing and prescriptive language standards. There is also value in the arguments surrounding ‘world rhetorics’.

Overall, however, Uysal’s claims do not acknowledge the wider context whichI E LT Sboth operates within and contributes to. Thus, we need to ask how far theI E LT Stest is ‘an international test of English’, how far it is ‘a test of international English’, and the extent to whichI ELT Sclaims each role.

The first of these questions is uncontested. Taken in 120 countries and with approaching 500,000 test takers each year,I E LT Sprovides a test with obvious international reach and influence. But doesIELTStest

international English, and how far does it claim to do so? Here, the picture is less clear. The currentI E LT Slogo suggests the organization engages with ‘English for International Opportunity’, which is not quite the same thing as ‘international English’. Critical discourse analysts might also detect interesting ideological constructs behind theIELTS’ website bylines ‘The world speaksIELTS’ and ‘The test that sets the standard’.

Whilst Taylor (2002) recognizes the need forI ELT Sto account for language variation, in her discussion of ‘guiding principles’ to determine the content and linguistic features to be tested byI E LT S, she proposes the notion of the ‘dominant host language’ and notes:

In the case ofI ELT S, which is an international test of English used to assess the level of language needed for study or training in English speaking environments. . .test material is written by a trained group of writers in the UK, Australia and New Zealand. . .[reflecting] the fact that different varieties of English are spoken in the contexts around the world in whichI ELT Scandidates are likely to find themselves. (p. 19–20)

Uysal’s claim thatI E LT Saims to assess international English (but does not do so) is therefore questionable, asI ELT Shas positioned itself slightly differently with regard to English language and English-medium education. IE LT Sdoes not hide its role as an English language gatekeeper ‘for people who intend to study or work where English is the language of

(6)

a range of stakeholders, such as prospective students, university lecturers, English-medium HE institutions, etc.

Furthermore, the British Council, a major stakeholder in the test as a co-manager ofI E LT S, additionally works ‘to strengthen the UK’s position within the international education community’ (British Council website 2009). The relevance of Davies’ perspective, noted at the start of this article, becomes increasingly clear—‘tests are inevitably political since what they do. . .is to sort and select to meet society’s needs’ (Davies 2003: 361).

TheI E LT Stest is thus embedded within global relations in the

contemporary academic world and, consequently, serves both to deliver and reinforce discourses which support native-speaker language norms. Uysal is therefore correct to suggest thatI E LT Smight promote conformity and homogeneity in both linguistic forms and rhetorical conventions in academia, although we need to acknowledge that this is, in part, a consequence of its purpose set out by other institutions, academic discourses, and societal and political contexts. However,I E LT Sis not alone in acting as a gatekeeper for English language; most, if not all, international English language tests fulfil something of a similar role.

TheIELT Scontext:

a summary

The above discussions indicate that the issuesI E LT Sfaces and the contexts within which theIELTStest has been developed and administered are typical of those faced by all international tests and testing organizations. I have argued that, given its social context and stated purpose, the test has been and is being developed with due skill and rectitude. Chalhoub-Deville and Turner (2000: 537) note that:

Developers of large-scale tests such as those reviewed in the present article have the responsibility to: construct instruments that meet professional standards; continue to investigate the properties of their instruments and the ensuing scores; and make test manuals, user guides and research documents available to the public.

Their paper generally indicates thatI E LT Sfollows these practices and notes that ‘IELTS’ commitment to research and its responsiveness to research findings is well documented in the literature. . .as such,I E LT Shas shown commitment to test practices informed by research findings’ (Chalhoub-Deville and Turner 2000: 533).

The broader context: beyondIELT S

Thus far, I have noted thatI ELT Sgrades test takers as effectively as possible and also that the test operates within a context of global, market-driven HE. The implications of this latter statement require further investigation.

(7)

Receiving organisations should also consider a candidate’sI ELT Sresults in the context of. . .motivation, educational and cultural background, first language and language learning history. (I E LT Swebsite 2007)

The ultimate responsibility for appropriate test use and interpretation lies predominantly with the test-user. . .(Chalhoub-Deville and Turner 2000: 113)

Whether all receiving organizations of theI E LT Stest duly consider these points is open to question. In the drive to recruit higher fee-paying overseas students to British and North American universities, ‘there is evidence to suggest that in some institutions, proficiency test scores are used in somewhat cruder fashion as. . .‘‘pass-fail indicators’’’ (Brindley and Ross 2001: 150).

It seems possible, then, that some receiving organizations need to take more responsibility for understanding and interpretingI E LT Stest scores and take much more responsibility for ensuring that arriving international students can meet the requirements of the programmes they enrol for, something even the most valid and reliable language test cannot ensure. At present,I E LT Stest scores offer an easy short cut when decisions are made concerning admissions to English-medium HE institutions, and

consequently, it is convenient to find fault withI E LT Swhen students encounter difficulties on their chosen programmes of study.

However, there is a broader point to be made aboutI ELT Stest user responsibility. The default position of most Western universities is that, for a British degree, standard inner circle English varieties are the only acceptable forms of language. Indeed,

There will be little incentive to modify. . .to suit the needs of the foreign student minority; and there will be (often justified) appeals to

‘maintaining academic standards’ whenever such suggestions are raised. Inevitably it will be the foreign students, not the system, that must make the necessary adjustments. (Adapted from Ballard 1996: 149)

However, how sustainable is this position? Given that foreign students are essential to Western HE’s continued development and financial strength, to the extent that British universities now talk about ‘the internationalised curriculum and campuses’, receiving organizations need to engage much more deeply in a critical debate over language standards and consider the case forEILvarieties. At present, then, it is English-medium HE

institutions, as much asI E LT S, who sustain the position of inner circle English varieties in international high-stakes testing.

Conclusions My response to Uysal’s review of theI ELT Swriting test has essentially developed two parallel strands of argument. Firstly, I have noted that, whilst questioning test provider’s practices such as test reliability and validity is a valid and valuable exercise, the published literature suggests thatI ELT Sis aware of and attempting to address problematic issues as effectively as possible in the normal course of test development and review.

(8)

to consider—that is, the social, economic, and political dimensions of international high-stakes English language testing. Test-receiving institutions need to develop clearer understandings not only of the uncertainties inherent in language testing but also develop appropriate responses to the emergence and development of varieties of English.

These are critical (and ethical) concerns for all stakeholders inI E LT Swhen acknowledging:

the complex responsibilities of the language tester and test users as the agents of political, commercial and bureaucratic forces. (Adapted from McNamara 2000: 24)

Final revised version received May 2009

References

Ballard, B.1996. ‘Through language to learning: preparing overseas students for study in Western universities’ in H. Coleman (ed.).Society and the

Language Classroom. Cambridge: Cambridge

University Press.

Banerjee, J., F. Franceschina,andA. M. Smith.2007. ‘Documenting features of written language production typical at differentI E LT Sband score levels’.IELT SResearch Reports 7. London: British Council.

Blackhurst, A.2004. ‘I E LT Stest performance data 2003’.University of CambridgeES O LExaminations Research Notes18: 18–20. Available at: http:// www.cambridgeesol.org/rs_notes/index.htm accessed 15 April 2009.

Brindley, G.andS. Ross.2001. ‘E A Passessment: issues, models and outcomes’ in J. Flowerdew and M. Peacock (eds.).Research Perspectives on English for

Academic Purposes. Cambridge: Cambridge

University Press.

British Council.Available at: http://

www.britishcouncil.org/ accessed 16 April 2009.

Chalhoub-Deville, M.andC. E. Turner.2000. ‘What to look for inE S Ladmission tests. Cambridge certificate exams,I E LT S, andT O E F L’.System28/4: 523–39.

Davies, A.2003. ‘Three heresies of language testing research’.Language Testing20/4: 355–68.

Falvey, P.andS. D. Shaw.2006. ‘I E LT Swriting: revising assessment criteria and scales (phase 5)’.

University of CambridgeE SO LExaminations Research Notes23: 7–12. Available at: http://

www.cambridgeesol.org/rs_notes/index.htm accessed 15 April 2009.

IELTSHomepage.Available at: http://www.ielts.org/ accessed 16 April 2009.

Hamp-Lyons, L.1990. ‘Second language writing: assessment issues’ in B. Kroll (ed.).Second Language Writing: Research Insights for the Classroom.

Cambridge: Cambridge University Press.

Jenkins, J.2006. ‘The spread ofE I L: a testing time for testers’.ELTJournal60/1: 42–50.

McNamara, T.2000.Language Testing(Oxford Introductions to Language Study Series) Oxford: Oxford University Press.

Oxford English Dictionary Online.Available at: http://www.oed.com accessed 16 April 2009.

Shaw, S.2007. ‘Modelling facets of the

assessment of writing within anE SMenvironment’.

University of CambridgeE S O LExaminations Research Notes27: 14–9. Available at: http:// www.cambridgeesol.org/rs_notes/index.htm accessed 15 April 2009.

Taylor, L.2002. ‘Assessing learner’s English: but whose/which English(es)?’.University of Cambridge

E SO LExaminations Research Notes10: 18–20. Available at: http://www.cambridgeesol.org/ rs_notes/index.htm accessed 16 April 2009.

The author

Graham Hallhas taught English in Europe, the Middle East, and the UK and is now a Senior Lecturer in the Division of English and Creative Writing at Northumbria University where he coordinates and teaches on Northumbria’s MA Applied Linguistics forT E S O Lprogramme. He also teaches English Language Studies to undergraduates.

(9)

International English language testing: a critical response

point and counterpoint