Abstract

Large databases of transcribed speech, downloadable from the Internet, are a corpus linguist's dream. They turn into a corpus linguist's nightmare, however, when the transcriptions are not linguistically accurate. In this paper I assess the suitability of the Hansard parliamentary transcripts (200 million words, downloadable) as a corpus linguistic resource, comparing a sample of the official transcript to a transcript made from a recording of a House of Commons session. The findings are that, as could be expected from earlier research, the transcripts omit performance characteristics of spoken language, such as incomplete utterances or hesitations, as well as any type of extrafactual, contextual talk (e.g., about turn-taking). Moreover, however, the transcribers and editors also alter speakers' lexical and grammatical choices towards more conservative and formal variants. Linguists ought, therefore, to be cautious in their use of the Hansard transcripts and, generally, in the use of transcriptions that have not been made for linguistic purposes.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.