A CRITICAL REVIEW OF THE IELTS WRITING TEST

(1)

Suggested Citation: Uysal, H. H. (2010). A critical review of the IELTS writing test. ELT Journal, 64(3), 314-320.

A CRITICAL REVIEW OF THE IELTS WRITING TEST

Abstract: Being administered at local centres throughout the world in 120 countries, IELTS is one of the most widely used large-scale ESL tests that also offers a direct writing test component. Because of its popularity and its use for making critical decisions about test takers, the present article finds it crucial to draw attention to some issues regarding assessment procedures of IELTS. Therefore, the present paper aims to provide a descriptive and critical review of the IELTS writing test by focusing particularly on various reliability issues such as single marking of papers, readability of prompts, comparability of writing topics, and validity issues such as the definition of the “international writing construct,” without considering variations among rhetorical conventions and genres around the world. Consequential validity-impact issues will also be discussed and suggestions will be given for the use of IELTS around the world and for future research to improve the test.

Keywords: IELTS, large-scale ESL testing, standardized writing tests, validity, reliability, testing English as an international language.

Introduction

(2)

General Information about the IELTS Writing Test

IELTS writing test is a direct test of writing in which tasks are communicative and contextualized with a specified audience, purpose and genre, reflecting the recent developments in writing research. There is no choice of topics; however, IELTS states that it continuously pre-tests the topics to ensure comparability and equality. IELTS has both academic and general training modules consisting of two tasks per module. In the academic writing task, for Task 1, candidates write a report of around 150 words based on a table or diagram and for Task 2; candidates write a short essay or general report of around 250 words in response to an argument or a problem. In general training writing, in Task 1, candidates write a letter responding to a given problem, and in Task 2, they write an essay in response to a given argument or problem. Both academic and general training writing tasks take 60 minutes. The academic writing component serves the purpose of making decisions for university admission of international students, whereas general writing serves the purposes of completing secondary education, undertaking work experience or training, or meeting immigration requirements in an English speaking country.

Trained and certified IELTS examiners assess each writing task independently giving more weight to Task 2 in marking than Task 1. At the end, writing scores along with other scores from each module of the test are averaged and rounded to produce an overall band score. However, how these descriptors are turned into band scores is kept confidential. There is no pass/fail cut scores in IELTS. Detailed performance descriptors have been developed which describe written performance at the 9 IELTS bands and results are reported as whole and half bands. IELTS provides a guidance table for users on acceptable levels of language performance for different programs to make academic or training decisions; however, IELTS advises test users to decide for their own acceptable band scores in the light of their experiences and local needs.

Reliability Issues

Hamp-Lyons (1990) defines the sources of error that reduce the reliability in a writing assessment as the writer, task, and raters as well as the scoring procedure. IELTS has put forth some research efforts to minimize such errors as well as the scoring procedure and to prove that acceptable reliability rates are achieved.

(3)

IELTS also claims that the use of analytic scales contributes to higher reliability as impressionistic rating and norm referencing are discouraged, and greater discrimination across bands is achieved. However, Mickan (2003) addressed the problem of inconsistency in ratings in IELTS exams and found that it was very difficult to identify specific lexicogrammatical features that distinguish different levels of performance. He also discovered that despite the use of analytic scales, raters tended to respond to texts as a whole rather than to individual components. Falvey and Shaw (2006), on the other hand, found that raters tend to adhere to the assessment scale step by step – beginning with task achievement then moving to the next criterion. Given the controversial findings about rater behaviour while using the scales, more detailed information about the scale and about how raters reach scores from analytical categories should be documented in more detail to confirm IELTS’ claims about the analytic scales.

IELTS pre-tests the tasks to ensure they conform to the test requirements in terms of content and level of difficulty. O’Loughlin and Wigglesworth (2003) investigated

task difficulty in Task 1 in IELTS academic writing and found differences among tasks in terms of the language used. It was found that the simpler tasks with less information elicited higher performance and more complex language from responders of all proficiency groups. Mickan, et al. (2000), on the other hand, examined the readability of test prompts in terms of discourse and pragmatic features and the test taking behaviours of test takers in the writing test, and found that the purpose and lexico-grammatical structures in the prompts influenced the task comprehension and writing performance.

IELTS also states that topics or contexts of language use, which might introduce a bias against any group of candidates of a particular background, are avoided. However, many scholars highlight that controlling the topic variable is not an easy task as it is highly challenging to determine a common knowledge base that can be accessed by all students from culturally diverse backgrounds and who might have varied reading experiences of the topic or content area (Kroll and Reid, 1994). Given the importance of the topic variable on writing performance and the difficulty of controlling it in such an international context, continuous research on topic comparability and appropriateness should be carried out by IELTS.

The research conducted by IELTS has been helpful to understand some variables that might affect the reliability and accordingly the validity of the scores. As indicated by research, different factors interfere with the consistency of the writing test in varying degrees. Therefore, more research is necessary especially in the areas of raters, scale, task, test taker behaviour, and topic comparability to diagnose and minimize sources of error in testing writing. Shaw (2007) suggests the use of Electronic script management data (ESM) in further research to understand various facets and interactions among facets which may have a systematic influence on scores.

Validity Issues

(4)

Australian universities. They found that IELTS task 1 was representative of the TLU content while IELTS task 2, which require students to agree or disagree with the proposition, did not match exactly with any of the academic genres in the TLU domain as the university writing corpus was based on external sources as opposed to IELTS task 2, which was based on prior knowledge as a source of information. IELTS task 2 was more similar to non-academic public forms of discourse such as letter to the editor; however, IELTS task 2 could also be considered close to the genre “essay”, which was the most common of the university tasks (60 %). In terms of rhetorical functions, the most common function in the university corpus was “evaluation” parallel to IELTS task 2. As a conclusion, it was suggested that an integrated reading-writing task should be included in the test to increase authenticity. Nevertheless, IELTS’ claims are based on the investigation of TLU tasks from only a limited context—UK and Australian universities; thus, representativeness and relevance of the construct and meaningfulness of interpretations in other domains are seriously questionable.

In terms of the constructs and criteria for writing ability, general language construct in IELTS is defined both in terms of language ability based on various applied linguistics and language testing models and in terms of how these constructs are operationalized within a task-based approach. Task 1 scripts in both general and academic writing are assessed according to task fulfilment, coherence, cohesion, lexical resource, and grammatical range and accuracy criteria. Task 2 scripts are assessed on task response (making arguments), lexical resource, and grammatical range and accuracy criteria. However, according to Shaw (2004), the use of the same criteria for both general and academic writing modules is problematic and this application was not adequately supported by scientific evidence. In addition, with the new criteria that have been in use since 2005, the previous broad category “communicative quality” has been replaced by “coherence and cohesion,” causing rigidity and too much emphasis on paragraphing (Falvey and Shaw, 2006). Therefore, it seems like traditional rules of form rather than meaning and intelligibility have recently gained weight in construct definitions of IELTS.

IELTS also claims that it is an international English test. At present, the claim of IELTS as an international test is grounded in the following issues (Taylor, 2002).

1. Reflecting social and regional language variations in test input in terms of content and linguistic features, such as including various accents;

2. Incorporating an international team (UK, Australia, and New Zealand) who is familiar with the features of different varieties into test development process; 3. Including NNS raters as well as NS as examiners of oral and written tests.

(5)

In addition, Taylor (2002) suggests that besides micro-level linguistic variations, macro-level discourse variations may occur across cultures. Therefore, besides addressing the linguistic varieties in English around the world --World Englishes --IELTS writing test should also consider the variations among rhetorical conventions and genres around the world --World Rhetorics-- while defining the writing construct especially related to the criteria on coherence, cohesion and logical argument. Published literature presents evidence that genre is not universal, but culture specific; and people in different parts of the world differ in terms of their argument styles and logical reasoning, use of indirectness devices, organizational patterns, the degree of responsibility given to the readers, and rhetorical norms and perceptions of good writing. Because especially the ability to write an argumentative essay, which is used in IELTS writing test, is found to demonstrate unique national rhetorical styles across cultures, IELTS corpus database should be used to find common features of argumentative writing that are used by all international test takers to describe the international argumentative writing construct (Taylor, 2004). This is especially important as UCLES plans to develop a common scale for L2 writing ability in the near future.

It is also important for IELTS to consider these cultural differences in rater training and scoring. Purves and Hawisher (1990), based on their study on an expert rater group, suggest that culture-specific text models also exist in readers’ heads and they form the basis for the acceptance and appropriateness of written texts, and affect the rating of student writing. For example, differences between NS and NNS raters were found in terms of their evaluations regarding topics, cultural rhetorical patterns, and sentence-level errors (Kobayashi and Rinnert, 1996). Therefore, it is also crucial to investigate both NS and NNS raters’ rating behaviours with relation to test-taker profile.

In terms of consequences, the impact of IELTS on the content and nature of classroom activity in IELTS classes, materials, and on the attitudes of test users and test takers has been investigated. However, these are not enough. IELTS should also consider the impact of IELTS writing test in terms of the chosen standards or criteria on the international communities in a broader context. Considering IELTS’ claims to be an international test, the judgment of written texts of students from various cultural backgrounds according to one writing standard based on Western writing norms may not be fair. Taylor (2002) states that people who are responsible for language assessment should consider how language variation affects the validity, reliability, and impact of the tests, and should provide a clear rationale for why they include or exclude more than one linguistic variety and where they get their norms from.

(6)

conventions rather than promoting a single Western norm of writing as pointed out by Kachru (1997). Therefore, considering the high washback power of IELTS, communicative aspects of writing rather than the strict rhetorical conventions should be emphasized in the IELTS writing test.

Conclusion

To sum up, IELTS is committed and has been carrying out continuous research to test its reliability and validity measures and to improve the test further. However, some issues such as the fairness of using a single prescriptive criterion on international test takers coming from various rhetorical and argumentative traditions and the necessity of defining the writing construct with respect to the claims of ELTS to be an international test of English have not been adequately included in these research efforts. In addition, some areas of research on the reliability of test scores point out serious issues that need further consideration. Therefore, the future research agenda for IELTS should include the following issues:

In terms of reliability, the comparability and appropriateness of prompts and tasks for all test takers should be continuously investigated; multiple raters should be included in the rating process and inter-and intra-rater reliability measures should be constantly calculated; more research regarding scales, how scores are rounded to a final score; and rater behaviour while using the scales should be conducted. IELTS has rich data sources such as ESM in hand; however, so far this source is not fully tapped to understand interactions among the above-mentioned factors with relation to test-taker and rater profile.

In terms of improving the validation efforts with regard to the IELTS writing module, future research should be performed to explore whether the characteristics of the IELTS test tasks and the TLU tasks match, not only in the domain of UK and Australia, but also in other domains. Cultural differences in writing should be considered both in the construct definitions and rater training efforts. Research in respect to determining the construct of international English ability and international English writing ability should also be conducted by using the already existing corpus of IELTS, and consequences of the assessment practices and criteria in terms of power relationships in the world context should also be taken into consideration. However, given that there is no perfect test that is valid for all purposes and uses; test users also have the responsibility to make their own research efforts to make sure that the test is appropriate for their own institutional or contextual needs.

References

Blackhurst, A. 2004. ‘IELTS test performance data 2003.’ Research notes, 18, 18-20.

Falvey, P. and S. D. Shaw. 2006. ‘IELTS writing: Revising assessment criteria and scales (phase 5).’ Research Notes 23, 7-12.

Hamp-Lyons, L. 1990. ‘Second language writing assessment issues.’ In Barbara Kroll (Ed.) Second language writing: Research insights for the classroom. NY: Cambridge University Press.

(7)

Kobayashi, H. & C. Rinnert. 1996. ‘Factors affecting composition evaluation in an EFL context: Cultural rhetorical pattern and readers’ background.’ Language Learning, 46/3: 397-437.

Kroll, B. & J. Reid. 1994. ‘Guidelines for designing writing prompts: Clarifications, caveats, and cautions.’ Journal of Second Language Writing, 3/3: 231-255.

Mickan, P., S. Slater, C. Gibson. 2000. ‘A study of response validity of the IELTS Writing Module.’ IELTS Research Reports, vol: 3, paper 2. Canberra: IDP: IELTS Australia.

Mickan, P. 2003. ‘What is your score? An investigation into language descriptors from rating written performance.’ IELTS Research Reports,vol: 5, paper 3. Canberra: IDP: IELTS Australia.

Moore, T. & J. Morton. 1999. ‘Authenticity in the IELTS Academic Module. Writing Test: A comparative study of Task 2 items and university assignments.’

IELTS Research Reports, vol: 2, paper 4. Canberra: IDP: IELTS Australia.

O’Loughlin, K. & G.Wigglesworth. 2003. ‘Task design in IELTS academic writing Task 1: The effect of quantity and manner of presentation of information on candidate writing.’ IELTS Research Reports, vol: 4, paper 3. Canberra: IDP: IELTS Australia.

Purves, A. and G. Hawisher. 1990. ‘Writers, judges, and text models.’ In Richard Beach and Susan Hynds (Ed.) Developing Discourse Practices in Adolescence and adulthood. Advances in discourse processes, Vol. 39. (pp. 183-199). NJ: Ablex Publishing.

Shaw, S. D. 2004. ‘IELTS writing: revising assessment criteria and scales (phase 3).’

Research notes 16, 3-7.

Shaw, S. D. 2007. ‘Modelling facets of the assessment of writing within an ESM environment.’ Research notes, 27, 14-19.

Taylor, L. 2002. ‘Assessing learner’s English: But whose/ which English(es)?’

Research Notes, 10, 18-20.