Taeho Kim
Corpus Linguistics 1 / 14
Chunks & colligation
How are words grammaticalized into patterns, which have particular grammatical features ... ?
I ‘chunks’ and ‘colligation’
Corpus Linguistics 2 / 14
Words do not exist in isolation butcluster togetherin particular ways to make larger meaningful units of language...
I such clustering is not random but very formulaic in nature ...
I each cluster has its own ‘in-built’ grammar ...
I such clusters are retrieved and stored in a speaker’s mind as‘wholes’
just like single words are retrieved.
Corpus Linguistics 3 / 14
Chunks & colligation
idiom principle(vs. open-choice principle)
I formulaic nature of language production→the majority of spoken and written texts are constructed and can be interpreted using the idiom principle...
I speakers and writers construct much language by adding chunks of language ...
I much language production comes ‘pre-packaged’ as words that cluster together in particular sequences, with their own particular
grammatical properties.
Corpus Linguistics 4 / 14
‘Chunks, lexical bundles, formulaic sequences’
I “A formulaic sequence is a sequence, continuous or discontinuous ....
by the language grammar.”
I NGRAMS/Clusters
Corpus Linguistics 5 / 14
Chunks & colligation
Colligation:
I The grammatical company a word keeps and the positions it prefers.
I Information about the ‘in-built’ grammar of a cluster
Corpus Linguistics 6 / 14
Colligationcan help to tell us more about a particular pattern ...
I the word classes that it co-occurs with...
I the tense and aspect it is most commonly used with...
I how it may function in clauses and utterances (e.g., a subject or object? in mid or final position? ...)
A clearer picture of a chunk→the patterns having distinctive grammatical properties→Lexico-grammar!
Corpus Linguistics 7 / 14
Chunks in corpora
Some open access corpora provide us with frequency data about chunks of different sizes...
I HKCSE (http://langbank.engl.polyu.edu.hk/HKCSE)
I It gives us a window into the way English is used in this context by native and non-native speakers....
Corpus Linguistics 8 / 14
The chunk ‘don’t know’ is highly frequent in native speaker usage ...
I it can answer a proposition but also act as a hedged answer.
→ “a speaker may be deliberately vague to sound less assertive, reduce an imposition on others or protect the face of others.”
face:
I positive face
I negative face
I face-threatening
Corpus Linguistics 9 / 14
Colligation patterns in corpora
The chunk ‘don’t know’ in spoken language has its own ‘in-built’
grammar.
‘I don’t know’ vs. ‘I didn’t know’→tense information What comes after ‘I don’t know what...’ ?
...
Corpus Linguistics 10 / 14
Chunks and colligation can give us vital information about how a text is patterned.
I Quantitative measures can help us to understand the most frequent patterning of texts and how they make meaning.
An example of a political speech
I the common chunks and colligations can give us an interesting view of how politicians wish to shape their message.
Corpus Linguistics 11 / 14
What can chunks and colligation tell us about language use?
How are the most common chunks patterned?
What can this tell us about the intentions behind it?
Certain chunks are far more likely to occur in certain texts and fulfill certain functions than others.
I ‘in winter’ (frequent in travel texts) vs. ‘during the winter times’
(frequent in gardening texts)
Corpus Linguistics 12 / 14
A corpus does not always extract chunks that are syntactically whole.
A certain degree of subjectivity is often required because it is not entirely clear whether a chunk such as ‘I mean I’ is processed any differently in the mind to ‘I mean’.
It can be difficult to say what the boundaries of any one chunk are.
Corpus Linguistics 13 / 14
Thank You
Thank You!!!
Corpus Linguistics 14 / 14