3 Concordances in the classroom without a computer: assembling and exploiting
3.9 Conclusion
Data collection and materials development
72
principled approach to corpus design is more likely to cover the lan- guage that students need than an approach which selects texts and lan- guage focus points in a more random fashion. See Willis and Willis (2007: 187–98) for more on the syllabus design process.
It is impossible for most language teachers and course designers to assemble their own research corpus for a particular group of learners, unless the learners’ target discourse is a very narrow, well-defined area which is readily researchable. But there is a growing range of more specialist language corpora with frequency lists already assembled (see Chapter 2 Appendix for sources) and over the next few years more will be made available for public use. It is, however, possible to aim at assembling the learners’ own pedagogic corpus, that is, one that reflects as far as possible their target language needs, even without the insights gained from a computational analysis of a research corpus.
The most frequent words, meanings and patterns are obviously going to be the most useful for learners and give the most efficient coverage of the target discourse. But in addition to the criterion of frequency, we need to take into account factors such as learnability and learners’
immediate interests. Thus the syllabus might well include words that are similar in the two languages, and words from topic areas and types of text (e.g. sport, pop songs, magazine pieces) that students find moti- vating. Such texts would then become part of the pedagogic corpus, and would undoubtedly also serve to illustrate more common uses of common words.
To further increase their vocabulary and to extend their experience of language, individual learners should always be encouraged to read (and listen) more widely on their own, and to look out for more exam- ples of specific features from outside data, but since this will be part of the individual learner’s corpus, and unfamiliar to other learners, it would not form part of the pedagogic corpus available for concordance analysis.
Concordances in the classroom
73 narrative – all useful for students wishing to write or to speak with more fluency and naturalness.
A full-length lexical syllabus derived from a suitable research cor- pus might comprise an inventory of, say, 2,000–3,000 words and their meanings and patterns. This could be used as a checklist, and would allow the teacher or materials writer to gain a far more reliable cover- age of language that learners needed. But this is the ideal, and without computational facilities, it would take a long time to find and assemble suitable examples of all these words from the materials selected for the pedagogic corpus.
The benefits of focusing on a mere 50 or so very common words may at first sight have seemed somewhat limited in scope. However, because these words occur so frequently in all kinds of text and have so many different uses, they provide the cement for a huge number of fixed and semi-fixed expressions and grammatical patterns. Using these common words as ‘bait’, learners are likely to catch a wide variety of other use- ful words, phrases and patterns, and will inevitably gain insights into new aspects of the target language as exemplified by their pedagogic corpus.
The analysis activities encourage learners to process text more closely, to systematise their knowledge and to look out for similar examples in their own reading outside class. Once attention has been drawn to the meanings, uses and functions of common words in the target language, learners are more likely to notice and reflect on further occurrences of the language items that have been made salient through study of the concordances. This process should lead to the development of the learn- er’s interlanguage. Analysis activities and awareness-raising procedures can also encourage learner independence and efficient dictionary use (especially with regard to the common words that students often think they know already and do not bother to look up). They help learners to recognise the parts played by collocation and lexical phrases and to realise there is more to language than just vocabulary and grammar.
Working directly from the data, searching for patterns, investigat- ing and describing what is actually there, is a secure and relatively unthreatening activity. It is ideal for mixed-level classes since, being a learner-centred activity, it allows students to work at their own level, in their own time and in their own ways. It also provides solid benefits for teachers. I have constantly found that language analysis activities inform and enrich my own view of the language. Not only learners but also teachers are likely to gain from an investigative approach to language.
Data collection and materials development
74
References
Batstone, R. 1994. ‘Product and process: grammar in the second language class- room’. In M. Bygate, A. Tonkyn and E. Williams (eds.), Grammar and the Language Teacher. Hemel Hempstead: Prentice Hall International.
Ellis, N. 2003. ‘Constructions, chunking, and connectionism: the emergence of second language structure’. In C. Doughty and M. Long (eds.), The Handbook of Second Language Acquisition. Oxford: Blackwell.
Ellis, R. 1991. Second Language Acquisition and Second Language Pedagogy.
Avon: Multilingual Matters.
2003. Task-based Language Teaching and Learning. Oxford:Oxford University Press.
Johns, T. 1991. ‘Should you be persuaded – two samples of data-driven learn- ing materials’. In T. Johns and P. King (eds.), Classroom Concordancing, ELR Journal 4. CELS: University of Birmingham.
2002. ‘Data-driven learning: the perpetual challenge’. In B. Kettemann and G. Marko (eds.), Teaching and Learning by Doing Corpus Analysis.
Amsterdam and New York: Rodopi.
Mauranen, A. 2004. ‘Spoken – general: spoken corpus for an ordinary learner’. In J. Sinclair (ed.), How to Use Corpora in Language Teaching.
Amsterdam: John Benjamins.
O’Keeffe, A., M. McCarthy and R. Carter. 2007. From Corpus to Classroom.
Cambridge: Cambridge University Press.
Römer, U. 2006. ‘Pedagogical applications of corpora: some reflections on the current scope and a wish list for future developments’. Zeitschrift für Anglistik und Amerikanistik, 54(2): 121–34, available at www.
uteroemer.com/ZAA 2006 Ute Roemer.pdf
Schmidt, R. 1990. ‘The role of consciousness in second language learning’.
Applied Linguistics, 11(2): 129–58.
Sinclair, J. (ed.). 2004. How to Use Corpora in Language Teaching.
Amsterdam: John Benjamins.
Skehan, P. 1994. ‘Interlanguage development and task-based learning’. In M. Bygate, A. Tonkyn and E. Williams (eds.), Grammar and the Language Teacher. Hemel Hempstead: Prentice Hall International.
Willis, D. 1990. The Lexical Syllabus. Collins Cobuild. Out of print but avail- able free on www.cels.bham.ac.uk/resources/LexSyll.shtml
2003. Rules, Patterns and Words: Grammar and Lexis in English Language Teaching. Cambridge: Cambridge University Press.
Willis, D. and J. Willis. 1996. ‘Consciousness-raising activities in the language classroom’. In J. Willis and D. Willis (eds.), Challenge and Change in Language Teaching. Oxford: Heinemann ELT. Now available on the authors’ website: www.willis-elt.co.uk/books.html
2007. Doing Task-based Teaching. Oxford: Oxford University Press.
Concordances in the classroom
75 Appendix A: Wordlists from a general research corpus
1 the 11,110,235 2 of 5,116,374 3 to 4,871,692 4 and 4,574,340 5 a 4,264,651 6 in 3,609,229 7 that 1,942,449 8 is 1,826,742 9 for 1,716,788 10 it 1,641,524 11 was 1,395,706 12 on 1,354,064 13 with 1,262,756 14 he 1,260,066 15 I 1,233,584 16 as 1,096,506 17 be 1,030,953 18 at 1,022,321 19 by 980,610 20 but 884,610 21 are 880,318 22 have 879,595 23 from 872,792 24 his 849,494 25 you 819,187 26 they 779,636 27 this 771,211 28 not 704,615 29 has 693,238 30 had 648,205 31 an 629,155 32 we 552,869 33 will 542,649 34 said 534,522 35 their 527,987 36 or 527,919 37 one 522,291 38 which 513,286 39 there 501,951 40 been 496,696 41 were 485,024 42 who 480,651 43 all 478,695 44 she 469,709 45 her 448,175 46 would 430,566 47 up 428,457 48 more 422,111 49 when 404,674 50 if 401,086
51 out 398,444 52 about 393,279 53 so 378,358 54 can 369,280 55 what 359,467 56 no 342,846 57 its 333,261 58 new 324,639 59 two 308,310 60 mr 302,507 61 than 297,385 62 time 293,404 63 some 293,394 64 into 290,931 65 people 289,131 66 now 287,096 67 after 280,710 68 them 279,678 69 year 272,250 70 over 266,404 71 first 265,772 72 only 260,177 73 him 259,962 74 like 258,874 75 do 256,863 76 could 255,010 77 other 254,620 78 my 253,585 79 last 238,932 80 also 236,350 81 just 232,389 82 your 227,200 83 years 217,074 84 then 214,274 85 most 208,894 86 me 206,475 87 may 198,700 88 because 196,595 89 says 193,730 90 very 189,285 91 well 188,445 92 our 186,013 93 government 184,618 94 back 184,105 95 us 182,796 96 any 180,222 97 even 178,657 98 many 173,938 99 three 173,093 100 way 172,787
101 world 170,293 102 get 168,694 103 these 168,486 104 how 167,461 105 down 166,119 106 being 165,168 107 before 165,119 108 much 164,217 109 where 161,691 110 made 161,595 111 should 159,023 112 off 155,770 113 make 153,978 114 good 153,878 115 still 151,889 116 ’re 151,359 117 such 150,812 118 day 150,684 119 know 147,052 120 through 145,920 121 say 143,888 122 president 143,502 123 don’t 142,288 124 those 142,260 125 see 141,845 126 think 140,701 127 old 140,096 128 go 137,929 129 between 137,009 130 against 136,989 131 did 135,593 132 work 131,780 133 take 131,212 134 man 130,580 135 pounds 130,095 136 too 129,804 137 long 127,660 138 own 125,299 139 life 124,047 140 going 124,018 141 today 123,869 142 right 121,995 143 home 121,052 144 week 119,115 145 here 118,177 146 another 116,325 147 while 115,963 148 under 113,114 149 London 112,310 150 million 112,138
Table 3.1 The 150 most frequent word forms occurring in The COBUILD Bank of English written corpus of 196 million words
Data collection and materials development
76
Table 3.2 The 150 most frequent word forms occurring in The COBUILD Bank of English spoken corpus of 196 million words
1 the 500,843
2 I 463,445
3 and 367,221
4 you 359,144
5 it 313,032
6 to 308,438
7 that 284,422
8 a 273,009
9 of 242,811
10 in 187,523
11 er 178,464
12 yeah 155,259 13 they 135,084
14 was 133,022
15 erm 132,836
16 we 124,928
17 mm 122,674
18 is 113,420
19 know 111,741
20 but 100,648
21 so 91,836
22 what 89,364
23 there 88,938
24 on 88,456
25 yes 87,211
26 have 84,294
27 he 79,137
28 for 77,842
29 do 77,207
30 well 75,287
31 think 74,543 32 right 74,191
33 be 66,492
34 this 65,424
35 like 63,948
36 ’ve 63,160
37 at 62,654
38 with 61,289
39 no 60,885
40 as 58,871
41 mean 58,825
42 all 58,360
43 ’re 57,131
44 or 56,857
45 if 56,774
46 about 56,321
47 not 56,109
48 just 55,329
49 one 55,189
50 can 53,090
51 are 51,775
52 got 51,727
53 don’t 51,273
54 oh 51,013
55 then 44,372
56 were 41,453
57 had 41,185
58 very 41,128
59 she 38,841
60 get 38,361
61 my 38,194
62 people 37,774
63 when 37,335
64 because 37,172 65 would 35,945
66 up 35,894
67 them 34,766
68 go 34,127
69 now 33,801
70 from 33,633
71 really 33,444
72 your 33,310
73 me 33,278
74 going 32,598
75 out 32,015
76 sort 31,555
77 been 30,405
78 which 30,334
79 see 30,325
80 did 30,175
81 say 29,720
82 two 28,817
83 an 27,485
84 who 27,220
85 how 26,837
86 some 26,172
87 name 26,029
88 time 25,990
89 ’ll 25,154
90 more 24,586
91 said 23,143
92 ’cos 22,345
93 things 21,982 94 actually 21,131
95 good 20,783
96 other 20,378
97 want 20,375
98 by 20,260
99 could 19,435
100 any 18,958
101 okay 18,757
102 much 18,567
103 didn’t 18,521 104 thing 18,480
105 lot 18,453
106 where 18,440 107 something 18,134
108 way 17,895
109 here 17,819
110 quite 17,470
111 come 17,089
112 their 16,892
113 down 16,678
114 back 16,505
115 has 16,017
116 place 15,888
117 bit 15,520
118 used 15,267
119 only 15,159
120 into 15,094
121 these 15,064 122 three 15,059
123 work 15,005
124 will 14,939
125 her 14,286
126 him 14,160
127 his 14,029
128 doing 13,921
129 first 13,273
130 than 12,998
131 went 12,842
132 put 12,692
133 why 12,653
134 our 12,610
135 years 12,437
136 off 12,393
137 those 12,248
138 us 12,245
139 course 12,211
140 mhm 12,112
141 isn’t 12,060
142 over 11,874
143 look 11,297
144 done 11,247
145 year 11,224
146 take 11,190
147 being 11,153 148 should 11,007 149 school 11,001 150 thought 10,786
Concordances in the classroom
77 Appendix B
This table shows what proportion of general English text is covered by the most frequent word forms. By word forms I mean that have, has, had, and so on, and singular and plural nouns, for example, each count as a separate item.
Table 3.3
The most common 25 word forms account for 29% of written text and 29%
of spoken text
50 36% 36%
100 42% 46%
500 56% 66%
(Source: Cobuild Bank of English: figures based on a written corpus of 196 million words and a corpus of unscripted speech of 15 million words.)
78