Abstract

This chapter describes both the process of creating a corpus as well as the methodological considerations that guide this process. It opens with a detailed discussion of the planning that went into the building of four different types of corpora: the British National Corpus (BNC), the Corpus of Contemporary American English (COCA), the Corpus of Early English Correspondence (CEEC), and the International Corpus of Learner English (ICLE). The structure of each of these corpora is also discussed: their length, the genres that they contain (e.g prose fiction, press reportage, blogs, spontaneous conversations, scripted speech), and other pertinent information. Subsequent sections discuss other topics relevant to building a corpus, such as defining exactly what a corpus is (can the web be considered a corpus?); determining the appropriate size of a corpus and the length of particular texts that the corpus will contain (complete texts versus shorter samples from each text, e.g. 2,000 words); selecting the particular genres be included a corpus (e.g. press reportage, technical writing, spontaneous conversations, scripted speech); and insuring that the writers or speakers whose speech or writing is included are balanced for such issues as gender, ethnicity, and age.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.