Casa ESL · C2 Mastery · Unit 16 of 20 · Step 2

Digital Humanities

Corpus linguistics awareness — collocations, concordance, frequency

Understand core concepts of corpus linguistics
Use collocational awareness to improve naturalness of expression
Analyse how word frequency and concordance reveal language patterns
Deploy digital humanities vocabulary at mastery level

Name

Date

collocation

noun

The habitual co-occurrence of words — combinations that sound natural to native speakers.

""Make a decision" is a strong collocation; "do a decision" is not."

concordance

noun

A list showing every occurrence of a word in a text or corpus, displayed in its immediate context.

"A concordance for "power" in political speeches reveals its shifting collocates over time."

corpus

noun

A large, structured collection of texts used for linguistic analysis.

"The British National Corpus contains 100 million words of British English."

lemma

noun

The base or dictionary form of a word (e.g., "run" is the lemma for runs, running, ran).

"Corpus searches can be conducted at the lemma level to capture all inflected forms."

frequency

noun

How often a word or phrase occurs in a given corpus or dataset.

"Word frequency analysis reveals that "the" is the most common word in English."

n-gram

noun

A contiguous sequence of n items from a given text (bigram = 2 words, trigram = 3).

"The bigram "climate change" has shown a dramatic increase in frequency since the 1990s."

keyness

noun

A statistical measure of how much more frequent a word is in a target corpus compared to a reference corpus.

"The keyness score revealed that "unprecedented" was disproportionately frequent in pandemic-era press releases."

semantic prosody

noun

The tendency of a word to occur in consistently positive or negative contexts, colouring its meaning.

""Cause" has a negative semantic prosody — it collocates predominantly with undesirable outcomes (cause damage, cause problems, cause concern)."

Collocational competence and corpus awareness

At C2 level, naturalness depends heavily on collocational accuracy — using word combinations that native speakers instinctively prefer. Common collocational patterns: adjective + noun (heavy rain, NOT strong rain), verb + noun (make a decision, NOT do a decision), adverb + adjective (deeply concerned, NOT very concerned in formal register). Corpus linguistics provides empirical evidence for these patterns: frequency data, concordance lines, and collocational profiles reveal which combinations are natural and which are not. Awareness of semantic prosody — the tendency of words to appear in positive or negative contexts — is also essential.

Natural: "take measures" / Unnatural: "do measures"

Natural: "utterly exhausted" / Unnatural: "completely exhausted" (acceptable but less idiomatic)

Semantic prosody: "commit" collocates overwhelmingly with negative actions (commit a crime, commit an error, commit suicide)

Corpus evidence: "make progress" appears 5x more frequently than "achieve progress" in academic English

Exercise 1

Choose the most natural collocation to complete each sentence.

1. The government must measures to address the crisis. (take / do / make)

2. The report serious concerns about data security. (raises / does / makes)

3. She has a knowledge of constitutional law. (thorough / heavy / wide)

4. The evidence strongly that the policy was ineffective. (suggests / tells / speaks)

5. He paid attention to the warning signs. (scant / small / thin)

Exercise 2

Match each word to its strongest collocate from the options given.

1. deeplyconcerned / embedded / rooted
2. utterlyexhausted / devastated / ridiculous
3. bitterlydisappointed / cold / opposed
4. highlyunlikely / regarded / skilled
5. widelyregarded / available / acknowledged

What Corpora Reveal

The advent of large digital corpora — collections of millions or even billions of words of naturally occurring text — has transformed our understanding of how language actually works, as opposed to how grammarians have traditionally claimed it works. Consider the word "cause." A dictionary defines it neutrally: "to make something happen." Yet corpus analysis reveals that "cause" has a strongly negative semantic prosody: in the British National Corpus, its most frequent collocates are "damage," "problems," "concern," "death," and "harm." We do not typically say "cause happiness" or "cause success" — not because the grammar forbids it, but because usage has imbued the word with negative associations. This collocational pattern is invisible to introspection; most native speakers would not, if asked, identify "cause" as a negative word. It is only through corpus analysis — examining thousands of concordance lines — that the pattern becomes apparent. The implications for language learners are profound. Traditional vocabulary instruction focuses on denotation: what a word means. Corpus-informed instruction adds collocation (what words it keeps company with), frequency (how common it is), and semantic prosody (what evaluative colouring it carries). A C2 learner who knows the meaning of "commit" but not its overwhelmingly negative collocational profile (commit a crime, commit an error, commit fraud) will produce language that is grammatically correct but pragmatically unnatural.

1. What does the passage mean by the "semantic prosody" of the word "cause," and how is this discovered?

2. How does the passage argue that corpus-informed instruction improves upon traditional vocabulary teaching?

Discuss these questions with a partner or your teacher.

1Think of a word in English that you frequently use. What are its strongest collocates? Are there combinations that feel natural to you that a corpus might reveal to be unusual — or combinations you avoid that might actually be more common than you think?
2Should language teaching be based primarily on corpus evidence (how people actually use language) or on prescriptive grammar rules (how people are told to use language)? What are the advantages and risks of each approach?

Write a paragraph (120-150 words) analysing the collocational profile of a single English word. Discuss its most common collocates, any semantic prosody it exhibits, and what this reveals about its pragmatic use.

Example: The verb "commit" presents a striking case of negative semantic prosody. Its most frequent collocates in major corpora are overwhelmingly negative: commit a crime, commit murder, commit fraud, commit suicide, commit an error. The word carries an implicit evaluation of the action as serious, irreversible, or morally charged — even when the grammar would permit a neutral reading. "Commit to a project" is an exception, though even here the sense of binding, irreversible obligation persists. A learner who produces "commit a kindness" would be grammatically correct but pragmatically jarring, violating a collocational norm invisible to the dictionary. This illustrates the limits of denotation-based vocabulary instruction: knowing what "commit" means is insufficient without knowing the company it keeps.

Answer Key — For Teacher Use

Exercise 1

1. take · 2. raises · 3. thorough · 4. suggests · 5. scant

Exercise 2

1. deeply → concerned / embedded / rooted · 2. utterly → exhausted / devastated / ridiculous · 3. bitterly → disappointed / cold / opposed · 4. highly → unlikely / regarded / skilled · 5. widely → regarded / available / acknowledged

Reading Comprehension

1. Semantic prosody is the tendency of "cause" to collocate with negative outcomes (damage, problems, death). It is discovered through corpus analysis — examining thousands of concordance lines reveals a pattern invisible to introspection. · 2. Traditional instruction focuses only on denotation (meaning), while corpus-informed instruction adds collocation (which words co-occur), frequency (how common a word is), and semantic prosody (evaluative colouring) — all essential for natural, pragmatically appropriate production.