The TV and Movies corpora

Mark Davies

doi:10.1075/ijcl.00035.dav

Abstract

AbstractThis paper discusses the creation and use of the TV Corpus (subtitles from 75,000 episodes, 325 million words, 6 English-speaking countries, 1950s-2010s) and the Movies Corpus (subtitles from 25,000 movies, 200 million words, 6 English-speaking countries, 1930s–2010s), which are available atEnglish-Corpora.org. The corpora compare well to the BNC-Conversation data in terms of informality, lexis, phraseology, and syntax. But at 525 million words in total size, they are more than 30 times as large as BNC-Conversation (both BNC1994 and BNC2014 combined), which means that they can be used to look at a wide range of linguistic phenomena. The TV and Movies corpora also allow useful comparisons of very informal language across time (containing texts from the 1930s and later for the movies, and from the 1950s onwards for TV shows) and between dialects of English (such as British and American English).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The TV and Movies corpora

Abstract

Talk to us

Similar Papers

More From: International Journal of Corpus Linguistics

Lead the way for us

Journal: International Journal of Corpus Linguistics	Publication Date: Nov 17, 2020
Citations: 22

Similar Papers

Linear grammar as a possible stepping-stone in the evolution of language.
Ray Jackendoff ... Eva Wittenberg
Psychonomic Bulletin & Review | VOL. 24
Ray Jackendoff, et. al.Ray Jackendoff ... Eva Wittenberg
01 Jul 2016
Psychonomic Bulletin & Review | VOL. 24

Improving Patient-Centered Communication in Aesthetic Surgery: A Patient Survey.
Alan T. Makhoul ... Lexy Kindt
Plastic and reconstructive surgery | VOL. 150
Alan T. Makhoul, et. al.Alan T. Makhoul ... Lexy Kindt
19 Jul 2022
Plastic and reconstructive surgery | VOL. 150

A Quantitative Study of Chinese Learners’ Identities as Reflected in Their Attitudes Toward English Accents
Yan Huang ... Azirah Hashim
GEMA Online® Journal of Language Studies | VOL. 20
Yan Huang, et. al.Yan Huang ... Azirah Hashim
28 Feb 2020
GEMA Online® Journal of Language Studies | VOL. 20

<i>Dialects of English: Newfoundland and Labrador English</i> (review)
Gordon Alley-Young
The Canadian Journal of Linguistics / La revue canadienne de linguistique | VOL. 57
Gordon Alley-YoungGordon Alley-Young
01 Jan 2012
The Canadian Journal of Linguistics / La revue canadienne de linguistique | VOL. 57

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The TV and Movies corpora

Abstract

Talk to us

Similar Papers

More From: International Journal of Corpus Linguistics