Casa ESL · C2 Mastery · Unit 16 of 20 · Step 2
Corpus linguistics awareness — collocations, concordance, frequency
Name
Date
Vocabulary
collocation
nounThe habitual co-occurrence of words — combinations that sound natural to native speakers.
""Make a decision" is a strong collocation; "do a decision" is not."
concordance
nounA list showing every occurrence of a word in a text or corpus, displayed in its immediate context.
"A concordance for "power" in political speeches reveals its shifting collocates over time."
corpus
nounA large, structured collection of texts used for linguistic analysis.
"The British National Corpus contains 100 million words of British English."
lemma
nounThe base or dictionary form of a word (e.g., "run" is the lemma for runs, running, ran).
"Corpus searches can be conducted at the lemma level to capture all inflected forms."
frequency
nounHow often a word or phrase occurs in a given corpus or dataset.
"Word frequency analysis reveals that "the" is the most common word in English."
n-gram
nounA contiguous sequence of n items from a given text (bigram = 2 words, trigram = 3).
"The bigram "climate change" has shown a dramatic increase in frequency since the 1990s."
keyness
nounA statistical measure of how much more frequent a word is in a target corpus compared to a reference corpus.
"The keyness score revealed that "unprecedented" was disproportionately frequent in pandemic-era press releases."
semantic prosody
nounThe tendency of a word to occur in consistently positive or negative contexts, colouring its meaning.
""Cause" has a negative semantic prosody — it collocates predominantly with undesirable outcomes (cause damage, cause problems, cause concern)."
Grammar Focus
Collocational competence and corpus awareness
At C2 level, naturalness depends heavily on collocational accuracy — using word combinations that native speakers instinctively prefer. Common collocational patterns: adjective + noun (heavy rain, NOT strong rain), verb + noun (make a decision, NOT do a decision), adverb + adjective (deeply concerned, NOT very concerned in formal register). Corpus linguistics provides empirical evidence for these patterns: frequency data, concordance lines, and collocational profiles reveal which combinations are natural and which are not. Awareness of semantic prosody — the tendency of words to appear in positive or negative contexts — is also essential.
Natural: "take measures" / Unnatural: "do measures"
Natural: "utterly exhausted" / Unnatural: "completely exhausted" (acceptable but less idiomatic)
Semantic prosody: "commit" collocates overwhelmingly with negative actions (commit a crime, commit an error, commit suicide)
Corpus evidence: "make progress" appears 5x more frequently than "achieve progress" in academic English
Exercises
Exercise 1
Choose the most natural collocation to complete each sentence.
1. The government must measures to address the crisis. (take / do / make)
2. The report serious concerns about data security. (raises / does / makes)
3. She has a knowledge of constitutional law. (thorough / heavy / wide)
4. The evidence strongly that the policy was ineffective. (suggests / tells / speaks)
5. He paid attention to the warning signs. (scant / small / thin)
Exercise 2
Match each word to its strongest collocate from the options given.
Reading
What Corpora Reveal
The advent of large digital corpora — collections of millions or even billions of words of naturally occurring text — has transformed our understanding of how language actually works, as opposed to how grammarians have traditionally claimed it works. Consider the word "cause." A dictionary defines it neutrally: "to make something happen." Yet corpus analysis reveals that "cause" has a strongly negative semantic prosody: in the British National Corpus, its most frequent collocates are "damage," "problems," "concern," "death," and "harm." We do not typically say "cause happiness" or "cause success" — not because the grammar forbids it, but because usage has imbued the word with negative associations. This collocational pattern is invisible to introspection; most native speakers would not, if asked, identify "cause" as a negative word. It is only through corpus analysis — examining thousands of concordance lines — that the pattern becomes apparent. The implications for language learners are profound. Traditional vocabulary instruction focuses on denotation: what a word means. Corpus-informed instruction adds collocation (what words it keeps company with), frequency (how common it is), and semantic prosody (what evaluative colouring it carries). A C2 learner who knows the meaning of "commit" but not its overwhelmingly negative collocational profile (commit a crime, commit an error, commit fraud) will produce language that is grammatically correct but pragmatically unnatural.
1. What does the passage mean by the "semantic prosody" of the word "cause," and how is this discovered?
2. How does the passage argue that corpus-informed instruction improves upon traditional vocabulary teaching?
Speaking
Discuss these questions with a partner or your teacher.
Writing
Write a paragraph (120-150 words) analysing the collocational profile of a single English word. Discuss its most common collocates, any semantic prosody it exhibits, and what this reveals about its pragmatic use.
Example: The verb "commit" presents a striking case of negative semantic prosody. Its most frequent collocates in major corpora are overwhelmingly negative: commit a crime, commit murder, commit fraud, commit suicide, commit an error. The word carries an implicit evaluation of the action as serious, irreversible, or morally charged — even when the grammar would permit a neutral reading. "Commit to a project" is an exception, though even here the sense of binding, irreversible obligation persists. A learner who produces "commit a kindness" would be grammatically correct but pragmatically jarring, violating a collocational norm invisible to the dictionary. This illustrates the limits of denotation-based vocabulary instruction: knowing what "commit" means is insufficient without knowing the company it keeps.
Answer Key — For Teacher Use
Exercise 1
1. take · 2. raises · 3. thorough · 4. suggests · 5. scant
Exercise 2
1. deeply → concerned / embedded / rooted · 2. utterly → exhausted / devastated / ridiculous · 3. bitterly → disappointed / cold / opposed · 4. highly → unlikely / regarded / skilled · 5. widely → regarded / available / acknowledged
Reading Comprehension
1. Semantic prosody is the tendency of "cause" to collocate with negative outcomes (damage, problems, death). It is discovered through corpus analysis — examining thousands of concordance lines reveals a pattern invisible to introspection. · 2. Traditional instruction focuses only on denotation (meaning), while corpus-informed instruction adds collocation (which words co-occur), frequency (how common a word is), and semantic prosody (evaluative colouring) — all essential for natural, pragmatically appropriate production.