2 Using corpora in the language classroom
2.3 Conclusion
Data collection and materials development
44
learners used a corpus of class readings to learn specialised vocabulary associated with a unit on anthropology. With this specialised corpus the teacher was also able to reinforce the non-content-specific academic words that were in the readings. The specific content terms (e.g. anthro- pology, matrilineal) were defined in the readings, whilst the more ‘invis- ible’ academic words were assumed to be known by the readers, which is often not the case in ESL settings.
Another example of a specialised class corpus is building a corpus of student papers. The teacher could then use this class corpus to guide students in comparing features found in their writing with features found in the MICUSP corpus. By filtering the searches in MICUSP, the comparisons between the student writing corpus and MICUSP can be made even more meaningful by selecting the student level (e.g. junior level or graduate level) and/or discipline (biology, philosophy, political science) that matches the students in the class.
In the case of an advanced writing course for ESP students, or an inter- disciplinary writing course, students could create their own mini corpora.
These corpora could then be used to explore the patterns found in the writing of their discipline. For example, in an interdisciplinary writing class a biology major who is required to write lab reports could assemble a corpus of lab reports, whilst a business major in the same class could assemble a corpus of business case studies to explore the type of writing tasks that are expected in that discipline. By having students create spe- cialised corpora, they will be able to independently explore the language that is used in their specific fields and also become familiar with the dif- ferent types of writing tasks that are common in their area of study.
Using corpora in the language classroom
45 As corpora and corpus tools become more available, and as teachers become better trained and more comfortable with using corpus resources (O’Keeffe, McCarthy and Carter 2007; Reppen 2010), the ways in which corpora will be used for language learning will continue to expand. One aspect that will not change is the need to match learner goals and teach- ing resources and to use appropriate resources to accomplish teaching and learning goals. Corpora are one more tool toward that goal.
References
Atkinson, D. 1999. Scientific Discourse in Sociohistorical Context: The Philosophical Transactions of the Royal Society of London. Hillsdale, NJ: Lawrence Erlbaum.
Balasubramanian, C. 2009. Register Variation in Indian English. Amsterdam:
John Benjamins.
Biber, D., S. Conrad and R. Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press.
Biber, D., S. Conrad, R. Reppen, P. Byrd and M. Helt. 2002. ‘Speaking and writing in the university: a multi-dimensional comparison’. TESOL Quarterly, 36: 19–48.
Biber, D., S. Johansson, G. Leech, S. Conrad and E. Finegan. 1999. The Longman Grammar of Spoken and Written English. London: Longman.
Carter, R. and M. McCarthy. 2006. The Cambridge Grammar of English.
Cambridge: Cambridge University Press.
Connor, U. and T. Upton. 2004. ‘The genre of grant proposals: a corpus linguistic analysis’. In U. Connor and T. Upton (eds.), Applied Corpus Linguistics: A Multidimensional Perspective. Amsterdam: Rodopi.
Conrad, S. and D. Biber. 2009. Real Grammar. Harlow: Pearson Longman.
Coxhead, A. 2000. ‘A new Academic Word List’. TESOL Quarterly, 34(2): 213–38.
Donley, K. M. and R. Reppen. 2001. ‘Using corpus tools to highlight academic vocabulary in SCLT’. TESOL Journal, 12: 7–12.
Ellis, N. 2005. ‘At the interface: dynamic interactions of explicit and implicit language knowledge’. Studies in Second Language Acquisition, 27: 305–52.
Fitzmaurice, S. 2003. ‘The grammar of stance in early eighteenth-century English epistolary language’. In P. Leistyna and C. Meyer (eds.), Corpus Analysis: Language Structure and Language Use. Amsterdam: Rodopi.
Friginal, E. 2006. ‘Developing technical writing skills in forestry using corpus-in- formed instruction and tools’. Paper presented at the American Association of Applied Corpus Linguistics Conference, Flagstaff, Arizona.
2009. The Language of Outsourced Call Centers: A Corpus-Based Study of Cross-Cultural Interaction. Amsterdam: John Benjamins.
Johns, T. 1994. ‘From printout to handout: grammar and vocabulary teaching in the context of data-driven learning’. In T. Odlin (ed.), Perspectives on Pedagogical Grammar. Cambridge: Cambridge University Press.
Data collection and materials development
46
McCarthy, M., J. McCarten and H. Sandiford. 2004/2006. Touchstone 1–4.
Cambridge: Cambridge University Press.
O’Keeffe, A., M. McCarthy and R. Carter. 2007. From Corpus to Classroom.
Cambridge: Cambridge University Press.
Reppen, R. 2010. Using Corpora in the Language Classroom. Cambridge:
Cambridge University Press.
Schmied, J. 2006. ‘East African Englishes’. In B. Kachru, Y. Kachru and C. Nelson (eds.), The Handbook of World Englishes. Basingstoke:
Blackwell.
Schmitt, D. and N. Schmitt. 2005. Focus on Vocabulary. Harlow: Longman.
Tribble, C. and G. Jones. 1997. Concordances in the Classroom. Houston:
Athelstan.
VanPatten, B. and J. Williams (eds.). 2007. Theories in Second Language Acquisition: An Introduction. Mahwah, NJ: Lawrence Erlbaum.
West, M. 1953. A General Service List of English Words. London: Longman.
Appendix: Examples of useful corpus sites and tools
AntConc
www.antlab.sci.waseda.ac.jp/software.html
This freeware program can create word frequency lists, and KWICs.
This easy-to-use program also identifies n-grams of 2–6 words.
AWL Highlighter
www.nottingham.ac.uk/~alzsh3/acvocab/awlhighlighter.htm
This site allows the user to input texts and highlights the words from the Academic Word List (AWL). It also has links to a gap-making pro- gram for fill-in-the-blank exercises, and to other useful sites.
British National Corpus (BNC) www.natcorp.ox.ac.uk
A 100-million word multi-register corpus of spoken and written British English, searchable by word or phrase. In addition to information about the BNC, the site has links to many resources. Note: accessing the BNC through Mark Davies’s site, http://view.byu.edu/BNC, allows a few more search options.
Business Letter Concordancer (BLC)
http://someya-net.com/concordancer/index.html
Using corpora in the language classroom
47 This site links users to a concordancer that accesses several corpora including a corpus of business letters, personal letters and letters of historic figures (e.g. Thomas Jefferson, Robert Louis Stevenson).
Collins Cobuild Corpus Concordance Sampler www.collins.co.uk/Corpus/CorpusSearch.aspx
This site allows the user to search a 56-million word corpus. Forty con- cordance lines are provided for each search.
Collocate
www.athelstan.com
This reasonably priced program identifies, collocates and generates n-grams (aka word clusters) and provides some statistics (e.g. mutual information, t scores).
Compleat Lexical Tutor www.lextutor.ca
In addition to access to various corpora and tools, this site allows you to input texts for vocabulary analysis based on the academic word list and the General Service Word List (West 1953). The site also has many useful articles on corpora and language teaching, and tests for assessing vocabulary.
Corpus.BYU.edu http://corpus.byu.edu
This site links to the many corpora (e.g. COCA and TIME) that are searchable through an interface developed by Mark Davies. The format for searches is the same regardless of the corpus. The interface is user- friendly and also allows for part-of-speech and wildcard searches. This site has one of the best interfaces with the BNC for word and phrase searches that include graphs and tables of search results by register.
Corpus of Contemporary American English (COCA) www.americancorpus.org
An online, searchable 400+ million word corpus of American English arranged by register, including news, spoken and academic texts. The texts in this corpus are from 1990 to the present. This site allows the user also to search by part of speech (POS).
Data collection and materials development
48
Corpus of Spoken Professional American English (CSPAE) www.athel.com/cspa.html
A two-million word corpus of professional spoken language (meetings, academic discussions, and White House press conferences). A 42,722 word sample is available for free.
ICAME – International Computer Archive of Modern and Medieval English http://icame.uib.no
A site with links to information and corpus resources.
ICE – International Corpus of English
www.ucl.ac.uk/english-usage/ice/index.htm
This site has information about the availability of several spoken and written one-million word corpora of various world Englishes. The cor- pora of the various world Englishes follow the same format and provide a rich resource for cross comparisons.
KfNgram
www.kwicfinder.com/kfNgram/kfNgramHelp.html
A site that has online concordance and collocation resources. This site allows users to input and search corpora.
Michigan Corpus Linguistics www.elicorpora.info
This site links users to many valuable corpora and corpus resources.
In addition to the two corpora mentioned below, there is also a corpus of Generation 1.5 writing and a corpus of conference presentations.
Teachers can find activities for using the suite of corpora from this site along with pre-made worksheets. Language researchers and students will also find useful materials on this well-designed site.
MICASE – Michigan Corpus of Academic Spoken English
This free, online, searchable corpus of academic spoken language is a valuable resource. The online concordancer is user-friendly and has a number of search options. In addition to the transcripts, the sound files
Using corpora in the language classroom
49 are also available. The corpus is available for purchase for a modest fee (use from the website is free). There are links to lesson material that has been prepared based on MICASE. There is also a free shareware program for transcription that can be downloaded.
MICUSP – Michigan Corpus of Upper Level Student Papers
This free, online, searchable corpus of student papers from a variety of disciplines provides teachers and students with many useful resources.
The searches can be designed to target specific disciplines, types of writ- ing, and/or parts of papers (e.g. conclusions, citations). The bar graph that displays results provides an easy-to-interpret visual. The first launched beta version will be upgraded to include more search features.
MonoConc
www.athelstan.com/mono.html
This affordable and easy-to-use concordancing package provides con- cordances, frequency lists and collocate information.
Paul Nation’s webpage
www.victoria.ac.nz/lals/staff/paul-nation/nation.aspx
This page has many links to information about vocabulary. It also has a download of a free program, Range, to compare target texts with two word lists (the General Service List and the Academic Word List).
Scottish Corpus of Texts and Speech www.scottishcorpus.ac.uk/corpus/search/
A Scottish corpus of spoken and written texts and a search tool.
Time Corpus
http://corpus.byu.edu/time/
This online corpus of Time Magazine from 1923 up to 2006 is search- able through Mark Davies’s user-friendly interface. The Time corpus allows interesting explorations of how language changes over a rela- tively short period of time. It is also a useful resource for looking at written academic language that is accessible for language learners. This site allows the user also to search by part of speech (POS).
Data collection and materials development
50
University of Lancaster
Centre for Computer Corpus Research on Language http://ucrel.lancs.ac.uk
This site is a rich resource of information about corpora and corpus linguistics.
VOICE – Vienna–Oxford International Corpus of English www.univie.ac.at/voice/
VOICE is a one-million word corpus of English as a lingua franca (ELF). The corpus is available online and includes over 1,250 speakers of mostly European languages interacting in English in a variety of set- tings. Free registration allows users to search the corpus in a variety of ways and to see complete transcripts.
WebCONC
www.niederlandistik.fu-berlin.de/cgi-bin/web-conc.cgi?art=google&
sprache=en
This online software produces KWICs in many languages.
Web Concordancer
www.edict.com.hk/concordance/
A free concordancing program and links to several corpora including Brown and Lancaster Oslo Bergen (LOB). This site also allows users to input and search a corpus.
WordSmith
www.lexically.net/wordsmith/
A concordancing program that, in addition to creating concordance lines, provides other information (e.g. frequency, key words, mutual informa- tion scores, word length, etc.). A powerful tool for searching a corpus.
51