Abstract

Abstract This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthographically transcribed conversations among L1 speakers of British English from across the UK, recorded in the years 2012–2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describe the main stages of the Spoken BNC2014’s creation: design, data and metadata collection, transcription, XML encoding, and annotation. In doing so we aim to (i) encourage users of the corpus to approach the data with sensitivity to the many methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corpora of the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, both logistically and practically, than in the past.

Highlights

  • The ESRC Centre for Corpus Approaches to Social Science (CASS) 1 at Lancaster University and Cambridge University Press have compiled a new, publiclyaccessible corpus of present-day spoken British English, gathered in informal contexts, known as the Spoken British National Corpus 2014 (Spoken BNC2014)

  • The need for a new corpus of conversational British English to allow researchers to continue the kinds of research that the Spoken BNC1994 has fostered over the past two decades. This new corpus will make it possible to turn the ageing of the Spoken BNC1994 into an advantage – if it can be compared to a comparable contemporary corpus, it could become a useful resource for exploring recent change in spoken English

  • We have presented a general overview of the design and compilation process of the Spoken BNC2014

Read more

Summary

Introduction

The ESRC Centre for Corpus Approaches to Social Science (CASS) 1 at Lancaster University and Cambridge University Press have compiled a new, publiclyaccessible corpus of present-day spoken British English, gathered in informal contexts, known as the Spoken British National Corpus 2014 (Spoken BNC2014). This design necessarily represents a compromise between the ideally representative corpus and the constraints of what is realistically possible.

Similar existing corpora – why do we need a new one?
The Spoken British National Corpus 1994
Other British English corpora containing spoken conversational data
Justification for the Spoken BNC2014
Corpus design and data collection
Opportunistic data collection
Recruitment of participants and audio recording
Metadata categories in the Spoken BNC2014
Higher professional occupations
Transcribing the Spoken BNC2014
Developing the transcription scheme
Speaker identification
Converting the transcripts
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call