The Spoken BNC2014
Abstract This paper introduces the Spoken British National Corpus 2014, an 11.5-million-word corpus of orthographically transcribed conversations among L1 speakers of British English from across the UK, recorded in the years 2012–2016. After showing that a survey of the recent history of corpora of spoken British English justifies the compilation of this new corpus, we describe the main stages of the Spoken BNC2014’s creation: design, data and metadata collection, transcription, XML encoding, and annotation. In doing so we aim to (i) encourage users of the corpus to approach the data with sensitivity to the many methodological issues we identified and attempted to overcome while compiling the Spoken BNC2014, and (ii) inform (future) compilers of spoken corpora of the innovations we implemented to attempt to make the construction of corpora representing spontaneous speech in informal contexts more tractable, both logistically and practically, than in the past.
- Abstract
56
- 10.1177/0023830918754598
- Feb 5, 2018
- Language and Speech
Children's ability to understand speakers with a wide range of dialects and accents is essential for efficient language development and communication in a global society. Here, the impact of regional dialect and foreign-accent variability on children's speech understanding was evaluated in both quiet and noisy conditions. Five- to seven-year-old children ( n = 90) and adults ( n = 96) repeated sentences produced by three speakers with different accents-American English, British English, and Japanese-accented English-in quiet or noisy conditions. Adults had no difficulty understanding any speaker in quiet conditions. Their performance declined for the nonnative speaker with a moderate amount of noise; their performance only substantially declined for the British English speaker (i.e., below 93% correct) when their understanding of the American English speaker was also impeded. In contrast, although children showed accurate word recognition for the American and British English speakers in quiet conditions, they had difficulty understanding the nonnative speaker even under ideal listening conditions. With a moderate amount of noise, their perception of British English speech declined substantially and their ability to understand the nonnative speaker was particularly poor. These results suggest that although school-aged children can understand unfamiliar native dialects under ideal listening conditions, their ability to recognize words in these dialects may be highly susceptible to the influence of environmental degradation. Fully adult-like word identification for speakers with unfamiliar accents and dialects may exhibit a protracted developmental trajectory.
- Supplementary Content
6
- 10.17635/lancaster/thesis/624
- Jan 1, 2019
- University of Lancaster
The ESRC-funded Centre for Corpus Approaches to Social Science at Lancaster University (CASS) and the English Language Teaching Group at Cambridge University Press (CUP) have collaborated to compile a new, publicly accessible corpus of contemporary Written British English, known as the Written British National Corpus 2014 (Written BNC2014). The Written BNC2014 is an updated version of the Written British National Corpus (Written BNC1994) which was created in the 1990s. The Written BNC1994 is often used as a proxy for present day British English, so the Written BNC2014 has been created in order to allow for both comparisons between the two corpora, and also to allow for research on British English to be carried out using a state-of-the-art contemporary data-set. The Written BNC2014 contains approximately 90 million words of written British English, published between 2010-2018, from a wide variety of genres. The corpus will be publicly released in 2019. This thesis presents a detailed account of the design and compilation of the corpus, focusing on the very many challenges which needed to be overcome in order to create the corpus, along with the solutions to these challenges which were devised. It also demonstrates the utility of the corpus, by presenting a diachronic comparison of academic writing in the 1990s and 2010s, with a focus on the theory of colloquialisation. This thesis, whilst not a Written BNC2014 user-guide, presents all of the decisions made in the design and creation of the corpus, and as such, will help to make the corpus as useful to as many people, for as many purposes, as possible.
- Research Article
- 10.35950/cbej.v31i132.13642
- Oct 4, 2025
- Journal of the College of Basic Education
This study examines the differences between British English and American English in the preference for should or the possibility of indicative forms in British English versus the overwhelming preference for be-subjunctives in American English is confirmed by the present- day data. The Mandative subjunctive is used to express suggestion, wishes, and demands and its usage has become subject to variation across various English dialects. The research uses data from the British National Corpus (BNC) on one hand, and from the Corpus of Contemporary American English (COCA) on the other, searching for contexts with similar triggers, like "requested that," "demanded that," and "urged that." The findings show that there is still an obvious preference for should in British English and for synthetic forms in American English, even though this paper also reveals there is a significant number of synthetic forms in the British English which contradict the traditional view that differentiates the two varieties. In contrast, the use of be-subjunctive is still strongly favored in American English. The results partially support statements regarding the distinctions between British and American English in terms of the mandative subjunctives.
- Research Article
199
- 10.1177/0075424206294369
- Dec 1, 2006
- Journal of English Linguistics
This large-scale corpus study charts differences between British English and American English as regards the use of “canonical” tag questions such as It's raining, isn't it?, It's not raining, is it?, or It's raining, is it? Several thousand instances of question tags were extracted from the British National Corpus and the Longman Spoken American Corpus, yielding nine times as many tag questions in colloquial British English as in colloquial American English (but also important register differences in British English). Polarity types and operators in tags also differ in the two varieties. Preliminary results concerning pragmatic functions point to a higher use of “facilitating” tags involving interlocutors in conversation in American English. Speaker age is important in both varieties, with older speakers using more canonical tag questions than younger speakers.
- Research Article
45
- 10.1177/0075424213511462
- Jan 9, 2014
- Journal of English Linguistics
The diachronic study of Philippine English (PhilE) has recently become possible through the compilation of a PhilE corpus (Phil-Brown) at De La Salle University. The period of time defined by Phil-Brown (whose sampling period was the late 1950s to the early 1960s) and ICE-Phil, the Philippines component of the International Corpus of English (comprising texts sampled in the early 1990s), covers most of the period of time over which there has been general recognition of the existence of PhilE as a World English. Based on a selection of texts from these two corpora, we examined recent changes in the use of a set of modals ( may, might, must, ought to, shall, and should) and quasi-modals ( be able to, be going to, be supposed to, have to, need to, and want to), investigating their frequency differences, genre variation, and semantic differentiation, and comparing the findings with those for British and American English of the same period. It was found that, in general, PhilE does not closely follow either British English or American English, with distinctive patterns identified both at the macro level of the overall rates of change for the modals and quasi-modals considered as two sets, and at the micro level of frequency changes of the individual items, thereby providing support for locating PhilE in the phase of “endonormative stabilization” of Schneider’s evolutionary scale. Nevertheless, there are certain areas where PhilE appears to have been striving to “catch up” with American English over the thirty-year period, suggesting that it may not yet be ready to completely renounce its exonormative allegiance to its postcolonial “parent.”
- Research Article
2
- 10.1017/s1360674324000546
- Jul 17, 2025
- English Language and Linguistics
English employs a variety of comparative formation strategies. Theoretical and corpus-based research has established that their distribution depends on a variety of factors. In this article, we take an experimental approach to test analytic, synthetic and double comparative forms in relation to register in American and British English. We report on a rating study investigating the appropriateness and interpretation in terms of evaluativity of the three comparative forms. Our findings confirm the hypothesis that the comparative variants are not considered equally appropriate, but the effect is not as strong as would be expected under the hypothesis that frequency of occurrence is directly related to linguistic judgments. The analytic and double comparative alternatives exhibit lower appropriateness levels than the synthetic comparative. Analytic and double comparative forms are rated as less appropriate in formal than in informal contexts, which did not show an effect on the synthetic form. Furthermore, the analytic variant shows a different behavior in terms of the interpretation than the other forms in that a stronger effect of evaluativity is detected. Limitations and future directions are discussed. Our study is the first to provide experimental evidence for certain hypotheses emerging from corpus-based research.
- Book Chapter
15
- 10.1163/9789004334113_012
- Jan 1, 2002
This large-scale corpus study documents the use of zero subject relative constructions in spoken American and British English. For this purpose, it makes extensive use of automated retrieval strategies. It shows that zero subject relatives are still present in spoken American and British English, as represented in the British National Corpus and the Longman Spoken American Corpus. Moreover, there is a sharp difference between American English with 2.5% and British English with 13% of subject relatives with zero relativizer. Although zero subject relative constructions are frequently found with existentials and it–clefts they are by no means limited to these constructions. The social variables of the study (most notably age) come from speaker annotation which is used to provide the apparent time dimension.
- Book Chapter
21
- 10.1007/978-3-319-45510-5_29
- Jan 1, 2016
This paper describes the compilation of a social media corpus with Facebook posts and WhatsApp chats. Authentic messages were voluntarily donated by Dutch youths between 12 and 23 years old. Social media nowadays constitute a fundamental part of youths’ private lives, constantly connecting them to friends and family via computer-mediated communication (CMC). The social networking site Facebook and mobile phone chat application WhatsApp are currently quite popular in the Netherlands. Several relevant issues concerning corpus compilation are discussed, including website creation, promotion, metadata collection, and intellectual property rights/ethical approval. The application that was created for scraping Facebook posts from users’ timelines, of course with their consent, can serve as an example for future data collection. The Facebook and WhatsApp messages are collected for a sociolinguistic study into Dutch youths’ written CMC, of which a preliminary analysis is presented, but also present a valuable data source for further research.
- Research Article
7
- 10.1017/s004740450530011x
- Apr 1, 2005
- Language in Society
Mats Deutschmann, Apologising in British English. Umeå, Sweden: Umeå University, 2003. Pp. 262. One of the most significant problems in speech act research is doubtless the shortage of naturally occurring spoken language in the data under observation. Researchers have applied a battery of techniques to collect examples of speech acts, but the vast majority of the work has been characterized by elicited language, wherein the starting point for the research has been the function of the speech act itself and the aim has been to investigate ways in which it is realized linguistically. Mats Deutschmann's book marks a clear departure from this tradition. His research into apologizing in British English is based solely on data from the spoken section of the British National Corpus (BNC). As a result, his starting point is also different: the form (linguistic realization) of the speech act rather than its function. Furthermore, in addition to conducting a specific investigation of the speech act “apologizing,” he sets himself the more ambitious target of revealing “general characteristics of the use of politeness formulae in British English” (p. 13).
- Research Article
- 10.30564/fls.v7i5.9432
- May 16, 2025
- Forum for Linguistic Studies
This article investigates the relationship between the terms show up and turn up in the Corpus of Contemporary American English (COCA) and the British National Corpus (BNC). In the COCA, the terms exhibit a 25% similarity in ranking, while in the BNC, this similarity is 14.28%. In terms of genre-specific usage, show up and turn up are most distant in the fiction genre in the COCA, while in the BNC, they are furthest apart in the spoken genre. The closest similarity occurs in the TV/movie genre for the COCA and in the magazine genre for the BNC. Frequency analysis reveals significant national variation. In American English, show up shows greater fluctuation from the mean, while turn up displays more consistent frequency. In contrast, British English usage demonstrates a more stable frequency for show up, while turn up exhibits more erratic variation. The standard deviations for show up in the COCA (1,134) and the BNC (18) further highlight this disparity, as turn up frequencies in both corpora show opposite trends. Statistically, the COCA reveals a strong positive correlation (r = 0.7375) between the two terms, suggesting a significant relationship in American English. However, the BNC’s correlation coefficient (r = 0.0669) indicates no meaningful connection between the terms in British English. This comparison underscores notable national variations in the usage and relationship between show up and turn up across the two varieties of English.
- Research Article
3
- 10.1080/07268602.2020.1823817
- Jul 2, 2020
- Australian Journal of Linguistics
Most previous studies on discourse markers (DMs) have yielded a common finding, that is, native speakers of English (NSs) and non-native speakers of English (NNSs) use discourse markers in different ways, especially as regards frequency of occurrence, position and function. Although discourse markers, such as I think, I mean and you know, are sometimes syntactically peripheral and poor in semantic meaning, they are pragmatically indispensable in spoken discourse and serve a variety of pragmatic functions. This study focuses on one of the most frequently overused discourse markers by NNSs, I think, and presents a comparative analysis of I think as used by Hong Kong English (HKE) speakers and British English (BrE) speakers in terms of its frequency of occurrence, position, collocation patterns and pragmatic functions, based on two parallel corpora: the Hong Kong component (ICE–HK) and the British component (ICE–GB) of the International Corpus of English (ICE). By highlighting similarities and differences in the use of I think by HKE and BrE speakers, this study also examines possible reasons that may lead to these.
- Research Article
- 10.31332/lkw.v7i2.3166
- Dec 30, 2021
- Langkawi: Journal of The Association for Arabic and English
Try and V construction is prevalent in British and American English. This construction is found in both spoken and written English, although with different frequencies. The verb in this construction only appears in in the base form. The lack of research on this verb formation leaves many aspects unexplored, one of which is the transitivity of the verb. Therefore, this study is intended to find out the number of arguments informed by this construction by matching the number of arguments to the verb try and the verb following it after the conjunction and. Two verbs were used to test this match, i.e., give and bring, which are three-place predicate verbs, and other two two-place predicate verbs, i.e., see and answer, were used to validate the finding. British National Corpus (BNC) and Corpus of Contemporary American English (COCA) were used to collect the data. The findings show that the number of arguments matched the verb following the conjunction and. Therefore, it can be concluded the number of arguments in try and V construction is not unique to this construction, but it is similar to the try to V, where V is the non-finite verb which selects the number of arguments. This result suggests that try and V construction needs to be included in English grammar textbooks in order that non-native speakers can use and understand this rare grammatical rule in appropriate contexts.
- Research Article
47
- 10.1111/j.1467-971x.2012.01754.x
- May 17, 2012
- World Englishes
ABSTRACT: This paper reports on a study into the reactions of ‘native’ speakers of British English to Dutch‐English pronunciations in the onset of a telephone sales talk. In an experiment 144 highly educated British professionals who were either familiar or not familiar with Dutch‐accented English responded to a slight Dutch English accent, a moderate Dutch English accent or a ‘Standard British English accent’ (BrE). These accents were rated on the personality traits status and affect, on their intelligibility (orthographic transcription), comprehensibility (identification of key words), and interpretability (paraphrasing the purpose of the message). Although British English was more intelligible and comprehensible than both Dutch English accents, all three accents were equally interpretable. The results indicated that a British English pronunciation evoked more status than both Dutch English accents, and both British English and the slight Dutch English accent commanded more affect than the moderate Dutch English accent.
- Research Article
- 10.17072/2073-6681-2022-3-26-33
- Jan 1, 2022
- Вестник Пермского университета. Российская и зарубежная филология
. In modern research, works on peculiarities of various cultures and the connection between concepts and cultures are becoming more and more topical. The article deals with the concept of ‘insularity’ as part of British conceptual worldview. The purpose is to analyze the linguistic means representing the concept of ‘insularity’ in British English. British mentality inevitably finds its linguistic manifestation in British English. This statement is supported by examples from the British National Corpus (BNC) and the dailyBritish newspaper The Guardian. In the course of research, the author analyzed dictionary definitions of lexemes that form an integral part of the concept in question. These lexical units are noted to carry negative connotations and be labeled ‘disapproving’ in authoritative monolingual dictionaries. The material was selected from the contexts presented in the BNC and The Guardian using the continuous sampling method. The main research methods employed are contextual, linguocognitive, and discourse analyses. The lexical items were taken from authoritative monolingual dictionaries, in which more than a half of the analyzed lexical units are marked as ‘disapproving’. Among these were words such as ‘complacent’, ‘philistine’, etc. Having analyzed the usage of the words ‘insular’, ‘insularity’, and other lexical units as used for characterization of the British, the author draws a conclusion about the role the concept of ‘insularity’ plays in shaping British conceptual worldview. Further research in the field would promote better understanding of British mentality and the peculiarities of British English.
- Research Article
16
- 10.1007/s41701-019-00059-8
- Jul 10, 2019
- Corpus Pragmatics
This paper examines the choice, frequency and stylistic variability of discourse markers in Nigerian English, using the International Corpus of English-Nigeria. Three types of discourse markers: elaborative, contrastive and inferential discourse markers were examined in the Nigerian corpus and these were compared with the International Corpus of English-Great Britain, from a variational pragmatic approach. The results were subjected to loglikelihood test and paired sample t test. The results indicate that there was both a significant difference in the overall frequency of discourse markers in Nigerian English and British English, and in the stylistic variability of these markers in the two corpora. Nigerian English speakers use elaborative and contrastive discourse markers less frequently than British English speakers, but utilise inferential discourse markers more frequently than British English speakers. Moreover, speakers of Nigerian English use a reduced inventory of discourse markers compared to British English speakers and exhibit distinct preference patterns for a few individual discourse markers. The paper also identifies the rise of a new discourse marker moreso/more so in Nigerian English which is used differently from its adverbial form. There were also differences and similarities in the stylistic variability of the discourse markers across the two varieties, which may be dependent on the status of Nigerian English as a second language and the influence of British English on Nigerian English.