2 Using corpora in the language classroom
2.2 Ways to use corpora in the classroom
This chapter will discuss three ways for teachers to provide learners with hands-on corpus activities. First, teachers can bring in material from corpus searches and have students work with the teacher-prepared material. Secondly, teachers can use some of the online corpora that are available. This section will focus on four available corpora that are very user-friendly (COCA, Time, MICASE and MICUSP). Thirdly, teachers can bring in existing corpora or create specialised corpora for their class (e.g. a corpus from readings or from student papers) and have students interact with the corpora. These three ways are described in the sections that follow. Each has certain advantages and, of course, the ideas can be used in combination. For example, a teacher might bring in some prepared concordance lines (i.e. samples of the use of a particu- lar language feature) to introduce new vocabulary, and then later have students search an online corpus to see more examples of the words in context, in order to provide students with greater exposure to the dif- ferent senses of the target word. This type of exposure to language can help learners get a better idea of the patterns of use and the words that co-occur with the new vocabulary.
Using corpora in the language classroom
37 2.2.1 Using teacher-prepared corpus material
In many places classes may not have easy access to computers, or may not be able to access the computer lab during class time. Teachers can still use corpus activities without having computers available for students (see Chapter 3 by Jane Willis). Instead of the students interacting with the corpus, the teacher will explore the corpus and bring the results into the classroom in the form of teacher-prepared material. For example, teach- ers can bring in word frequency lists or concordance lines that feature tar- get vocabulary. There are several advantages of teacher-prepared corpus material for learners. One major advantage is that teachers can control the material. Since the teachers search the corpus for the students, and then bring in those results, teachers can make sure that the vocabulary load is not too great, and that the students are exposed to the target form in a way that is meaningful and relevant for the students. This is a defi- nite advantage in beginning courses where vocabulary load is an issue.
In a lower level class the teacher might decide to delete the second-to-last and last lines of the examples of the verb concordance in Figure 2.1, since these contain difficult discipline-specific vocabulary (e.g. foot pounds, erg, j, N). Removing these lines does not impact the authenticity of the material; rather, it helps provide lower level students with meaningful and non-distracting input. Teacher-prepared concordance lines allow teach- ers to check that the content is appropriate for their learners. Prepared concordance material is also an ideal way to introduce students to read- ing concordance lines, something that can be distracting or confusing at first since sentences are often not complete (see Figure 2.1).
... that Mister Rogers is the best show on TV. But just because someone ...
... playmate: “Mister Rogers” is the best show on TV; and if you don’t ...
... the week. present a current events show on a daily or weekly basis ...
... look like a real T.V. show . Our, um, our guest speaker ...
... they need to be able to show that this data set is in ...
... the plaintiff. The plaintiff must now show that the reason offered by the ...
... particular emotion. In this section I show that even if this enterprise were ...
... We added them up. And we show that, in the limit, there’s no ...
... foot pounds. It is easy to show that 1 j equals 107 erg ...
... remainder of this section we will show that this is true, but only ...
... that the savings ratio is (formula). show that, for reasonable values of N ...
Figure 2.1 An example of concordance output for the target word show from a corpus of textbooks and class lectures (T2K-SWAL corpus, Biber et al., 2002).
Data collection and materials development
38
Teachers can also use concordance output or KWICs (key word in context) like those shown in Figure 2.1 to begin a discussion of how a word can belong to different parts of speech (e.g. both as a noun and a verb) and to help students see patterns that are associated with the different forms. For example, in Figure 2.1 the teacher has grouped the target form show by part of speech (i.e. noun vs. verb) to help students see patterns. Looking at the KWICs in Figure 2.1 highlights that, as a verb, show is often followed by that, thus exposing students to a strong pattern that is found in academic writing. Or, the teacher could ask students to discover clues to help them to know that show is being used as a noun (e.g. use of an article). By first introducing students to KWICs that are brought into the classroom and guiding them to engage ways to discover patterns of language use, students will be less overwhelmed when interacting with corpora on their own. In addition, they will be practising valuable analytical skills and will become familiar with some of the processes for discovering patterns of language use that can help them to become more autonomous language learners.
2.2.2 Using web corpora
For teachers whose students are fortunate enough to have access to computers, online corpora are a useful resource. The availability of corpora with web interfaces is something that has changed drastically over the past five years. Although interacting with a corpus on the Web can limit some research options, it provides a wealth of options for language teachers and learners. This section will present examples from four online corpora that have very friendly user interfaces that can be used to address a range of different language-learning situations. The Appendix also lists other corpus resources that might be of interest to teachers and learners.
The first two corpora in this section are the Time corpus and COCA (Corpus of Contemporary American English); both were developed by Mark Davies and have the same user interface and both of these corpora can be useful for teaching. Amongst other uses, the COCA can be a use- ful tool for raising student awareness of differences of language use in speech and writing, whilst the Time corpus can provide accessible exam- ples of writing. In addition to providing KWICs, the interfaces of these two corpora allow users to search the corpus specifying a part of speech.
Search results can be displayed in either a list or chart (e.g. bar graph).
The bar graph display in the COCA can be a useful tool to raise learn- ers’ awareness of differences between forms that are frequently used in speech but not in writing. Figure 2.2 shows the results of a search on the word get. The bar graph provides students with a powerful visual
Using corpora in the language classroom
39
Figure 2.2Screen shot from COCA for get
Data collection and materials development
40
of how frequent get is in speech, and how infrequently it is used in aca- demic writing. Students could then be asked to think of other words to use instead of get when writing papers for class.
Since the Time corpus is a written collection from a news magazine, it is a rich resource for looking at academic writing that is not discipline-specific, and which has vocabulary that is accessible to advanced learners. In addi- tion to vocabulary, one of the ways that the Time corpus can be used to help advanced writers is to look at the use of various transitions. Teachers could either give individual students lists of transition words to explore in the corpus, or students could work in teams to explore certain transitions and then report their findings to the class.
The next two corpora were created by the English Language Institute (ELI) of the University of Michigan: MICASE (Michigan Corpus of Academic Spoken English) and MICUSP (Michigan Corpus of Upper Level Student Papers). MICASE is a 1.8 million word corpus of spoken academic language from a variety of university contexts (e.g. lectures, study groups, advising sessions) across a range of disciplines. In addi- tion to the transcripts, sound files are also available. The sound files are linked to the transcripts, thus allowing teachers and students a variety of options to enhance academic listening skills and to create focused listening activities. Users can filter searches by many criteria includ- ing: discipline, speaker gender, academic level, interactivity of the inter- action, and native language of the participants. The home page for the MICASE corpus also offers teachers and students many valuable ideas and resources for interacting with the corpus.
The recently launched MICUSP site is a corpus of about 2.6 million words from 829 university student papers that have received a grade of A or A−. As with MICASE, users can search the corpus by discipline and student level. In addition, users can also target particular types of writing (e.g. argumentative, creative, research report) and particu- lar features of writing (graphs, abstracts, methods sections). A clear bar graph displays the results of searches. In addition to the gener- ous context that is provided in the KWICs, users can view complete papers. This new corpus will be a tremendous resource for intermedi- ate and advanced writing courses. In addition, MICUSP can be used in both native and non-native disciplinary writing courses, providing ESP (English for Specific Purposes) classes with an amazing teaching tool. For example, both native and non-native English-speaking biol- ogy students can see how successful writers refer to charts in the text of research papers. Students tackling academic papers for the first time can see many examples of how citations are used and also see a variety of ways to use citations. Students can see many examples of well- written abstracts as used in student papers, rather than only from published
Using corpora in the language classroom
41 research articles, thus providing a more realistic target for their writ- ing. Figure 2.3 shows the search results for the word claim as it occurs with the textual feature ‘Reference to sources’ across disciplines and levels, whilst Figure 2.4 shows the results for the word find with the same search values. Not only can the user immediately see that find is used across more disciplines, but it is also evident that claim is strongly preferred by students writing papers for philosophy courses.
The goal of such an activity is not to prescribe that using claim when writing philosophy papers will result in a better grade, but to examine the KWICs and see how the words claim and find are used in different disciplines and to become aware of some of the subtleties of these two words. Additionally, when looking through the texts to see how suc- cessful student papers reference sources, students will hopefully add to their variety of resources for referring to sources.
2.2.3 Creating corpora for classroom use
Whilst using existing online corpora is much easier than creating cor- pora in the classroom, the available online corpora may not meet specific needs of certain language classes. Additionally, using corpora and cor- pus search tools in the classroom can provide teachers and learners with information that is not available from the online corpora. For example, in beginning reading classes, knowing the amount of unknown words in a text is extremely useful. In this case teachers can use the word fre- quency lists from corpus search tools such as MonoConc, AntConc or WordSmith to quickly assess the amount of new vocabulary students will encounter in a reading. Teachers can have students scan the frequency list of words in a text and note where they begin to encounter unknown words. The teacher can then make an informed decision as to the dif- ficulty of the text. If too many words are unknown, the teacher imme- diately knows that the text is too challenging, and can select another reading. Or, if only some words are unknown, depending on how many and which words are unknown, the teacher could use a variety of activi- ties to help students discover the meanings of the unfamiliar words. For example, students could work in teams and use KWICs from the text to discover the word meanings. Discovery approaches, or focused noticing activities, help learners not only to become autonomous learners, but also help them to learn target forms (vocabulary or grammar) more effectively (Ellis 2005; VanPatten and Williams 2007).
In a reading class or a content-based class, creating an electronic ver- sion or mini corpus of the readings can offer a variety of activities for interacting with the texts. Donely and Reppen (2001) describe how a content-based course for intermediate and advanced English language
Data collection and materials development
42 Figure 2.3Screen shot from MICUSP showing the results of a search for claim used in reference to sources.
Using corpora in the language classroom
43 Figure 2.4Screen shot from MICUSP showing the results of a search for find used in reference to sources.
Data collection and materials development
44
learners used a corpus of class readings to learn specialised vocabulary associated with a unit on anthropology. With this specialised corpus the teacher was also able to reinforce the non-content-specific academic words that were in the readings. The specific content terms (e.g. anthro- pology, matrilineal) were defined in the readings, whilst the more ‘invis- ible’ academic words were assumed to be known by the readers, which is often not the case in ESL settings.
Another example of a specialised class corpus is building a corpus of student papers. The teacher could then use this class corpus to guide students in comparing features found in their writing with features found in the MICUSP corpus. By filtering the searches in MICUSP, the comparisons between the student writing corpus and MICUSP can be made even more meaningful by selecting the student level (e.g. junior level or graduate level) and/or discipline (biology, philosophy, political science) that matches the students in the class.
In the case of an advanced writing course for ESP students, or an inter- disciplinary writing course, students could create their own mini corpora.
These corpora could then be used to explore the patterns found in the writing of their discipline. For example, in an interdisciplinary writing class a biology major who is required to write lab reports could assemble a corpus of lab reports, whilst a business major in the same class could assemble a corpus of business case studies to explore the type of writing tasks that are expected in that discipline. By having students create spe- cialised corpora, they will be able to independently explore the language that is used in their specific fields and also become familiar with the dif- ferent types of writing tasks that are common in their area of study.