The 400 million word Corpus of Historical American English (1810–2009)

Mark Davies

doi:10.1075/cilt.325.11dav

Abstract

The 400 million word Corpus of Historical American English (1810–2009) provides researchers with an extremely robust set of data for Late Modern English. The corpus is composed of fiction, magazines, newspapers, and nonfiction books, and its genre balance stays roughly the same from decade to decade. Because of its size and its advanced architecture and interface, it allows researchers to look at an extremely wide range of changes – many of which would not be possible with a small 2–4 million word corpus. These include the frequency of any word or phrase by decade and mass comparison of all words in different periods (to examine lexical changes), morphological shifts (via wildcards and pattern matching), syntactic shifts (due to very accurate lemmatization and part of speech tagging), and semantic change (by comparing collocates over time, as well as searches that use data from the integrated thesaurus and customized word lists).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The 400 million word Corpus of Historical American English (1810–2009)

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Expanding horizons in historical linguistics with the 400-million word Corpus of Historical American English
-
Corpora | VOL. 7
--
01 Nov 2012
Corpora | VOL. 7

Expanding horizons in historical linguistics with the 400-million word Corpus of Historical American English
Mark Davies
Corpora | VOL. 7
Mark DaviesMark Davies
01 Nov 2012
Corpora | VOL. 7

An improved test of the constant rate hypothesis: late Modern American English possessive have
Richard Zimmermann
Corpus Linguistics and Linguistic Theory | VOL. 19
Richard ZimmermannRichard Zimmermann
20 Jun 2022
Corpus Linguistics and Linguistic Theory | VOL. 19

Evaluation of word embedding models used for diachronic semantic change analysis
Yulia Maslennikova ... Vladimir Bochkarev
Journal of Physics: Conference Series | VOL. 2701
Yulia Maslennikova, et. al.Yulia Maslennikova ... Vladimir Bochkarev
01 Feb 2024
Journal of Physics: Conference Series | VOL. 2701

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The 400 million word Corpus of Historical American English (1810–2009)

Abstract

Talk to us

Similar Papers