A Corpora-based Analysis of You must and You have to
The ultimate goal of this paper is to provide an in-depth analysis of the frequency of you must and you have to in the Corpus of Contemporary American English (COCA), the British National Corpus (BNC), and the Corpus of Historical American English (COHA). The COCA clearly shows that you have to may be the preferable one for Americans. When it comes to the genre frequency of you must and you have to, you must is the most frequently used one in the TV/movie genre and you have to is the most commonly used one in the blog genre. The BNC indicates, on the other hand, that you have to may be preferred over you must by British people. The BNC clearly shows that in the fiction genre, you must is the most widely used one, whereas in the spoken genre, you have to is the most frequently used one. This paper argues that the expression you must know is the most preferred by Americans, followed by you must go, you must understand, you must think, and you must take, in that order. This paper further argues that the expression you have to go is the most preferred one in America, followed by you have to get, you have to say, you have to make, and you have to take, in that order. Additionally, the BNC shows that the expression you must know is the most preferred by British people, followed by you must provide, you must go, you must get, and you must take, in that order. The BNC indicates, on the other hand, that the expression you have to go is the most preferred by British people, followed by you have to pay, you have to get, you have to take, and you have to make, in that order. Finally, the COHA clearly shows that you have to may have been the most preferable one for Americans in 1930, whereas you have to may have been the most preferable one for Americans in 2000.
- Research Article
- 10.22158/selt.v8n4p48
- Nov 20, 2020
- Studies in English Language Teaching
The main goal of this paper is to provide a detailed frequency analysis of the five types it is imperative that, it is vital that, it is essential that, it is important that, and it is necessary that within the British National Corpus (100 million, British, 1980s-1993), the Corpus of Contemporary American English (1.0 billion, US, 1990-2019), the Corpus of Historical American English (400 million, US, 1810s-2000s), and the Hansard Corpus (1.6 billion, British Parliament). In this paper, we have examined the frequency of the five types and collected the data. A major point to note is that it is important that was the most preferred by British people, followed by it is essential that, it is vital that, it is imperative that, and it is necessary that, in that order. The BNC clearly shows, on the other hand, that it is important that was the most commonly used one in the spoken genre, magazine genre, newspaper genre, and academic genre. A further point to note is that it is important that was the most preferred by Americans, followed by it is imperative that, it is essential that, it is vital that, and it is necessary that, in that order. The COCA clearly indicates that it is important that was the most widely used one in the blog genre, web genre, spoken genre, fiction genre, magazine genre, newspaper genre, and academic genre. The reason why it is important that was the most preferred by Americans and British people in the academic genre may be that a moderate obligation is suitable for conveying factual information. With respect to the COHA, it is worth noting that it is necessary that was the most preferred by Americans from 1810 to 2000, followed by it is important that, it is essential that, it is imperative that, and it is vital that. As for the HC, it is important that was the most preferred by British politicians, followed by it is essential that, it is vital that, it is necessary that, and it is imperative that. It is worth noting that Americans and British politicians show the similar pattern in the ranking of the five types in that Americans did not prefer a strong statement or the strongest statement, whereas British politicians did not prefer the strongest statement.
- Research Article
- 10.30564/fls.v7i5.9432
- May 16, 2025
- Forum for Linguistic Studies
This article investigates the relationship between the terms show up and turn up in the Corpus of Contemporary American English (COCA) and the British National Corpus (BNC). In the COCA, the terms exhibit a 25% similarity in ranking, while in the BNC, this similarity is 14.28%. In terms of genre-specific usage, show up and turn up are most distant in the fiction genre in the COCA, while in the BNC, they are furthest apart in the spoken genre. The closest similarity occurs in the TV/movie genre for the COCA and in the magazine genre for the BNC. Frequency analysis reveals significant national variation. In American English, show up shows greater fluctuation from the mean, while turn up displays more consistent frequency. In contrast, British English usage demonstrates a more stable frequency for show up, while turn up exhibits more erratic variation. The standard deviations for show up in the COCA (1,134) and the BNC (18) further highlight this disparity, as turn up frequencies in both corpora show opposite trends. Statistically, the COCA reveals a strong positive correlation (r = 0.7375) between the two terms, suggesting a significant relationship in American English. However, the BNC’s correlation coefficient (r = 0.0669) indicates no meaningful connection between the terms in British English. This comparison underscores notable national variations in the usage and relationship between show up and turn up across the two varieties of English.
- Book Chapter
- 10.1002/9781119518297.eowe00072
- Mar 11, 2025
There are many corpora of Inner Circle Englishes. Several corpora, especially the Brown family of corpora and the various national components of the International Corpus of English (ICE), cover their contemporary use in a balanced fashion. Larger register‐balanced corpora of contemporary usage cover British and American English well, but do not extend to the other Inner Circle varieties. Historical American English is covered very well from the early nineteenth century by the Corpus of Historical American English (COHA), and American and British English are covered from the seventeenth century in A Representative Corpus of Historical English Registers (ARCHER). Other Inner Circle varieties are covered from the late eighteenth and early nineteenth centuries in smaller historical corpora. Inner Circle varieties are particularly well represented in the very large online corpora such as the Global Web‐based English Corpus (GloWbE) and News on the Web Corpus (NOW). While there is some imbalance in the representation of American and British English versus the others, there are adequate resources to support extensive grammatical, lexical and stylistic research into Inner Circle varieties on a comparative basis.
- Single Book
5
- 10.4324/9780367815899
- Dec 29, 2020
This book uses corpus-based methodologies to investigate the wide variety of factors behind verb number agreement with complex collective noun phrases in English. The literature on collective nouns and their agreement patterns spans an array of disciplines and approaches. However, little of the research conducted to date has focused on the influence of of-dependents on verb number with relational collective nouns, as in examples such as a bunch of or a group of. Drawing on data from two case studies – one based on the Corpus of Historical American English (COHA), and the other on the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA) – Fernández-Pena uses statistical modelling to unpack the different morphological, syntactic, semantic and lexical dimensions of the variables affecting verb number agreement with complex collective noun phrases in English. This multidimensional analysis of the significance of of-dependents in the patterning and contemporary usage of collective nouns offers new insight into and understanding of both synchronic variation and diachronic change. This book is an essential read for scholars of English language variation and change, historical linguistics, corpus linguistics, and usage-based approaches to the study of language.
- Research Article
2
- 10.17507/tpls.1111.01
- Nov 2, 2021
- Theory and Practice in Language Studies
The goal of this paper is to compare the collocations of May as well and Might as well and to provide an in-depth analysis of the frequency of each expression in the Corpus of Contemporary American English (3 July. 2021. Online https://corpus.byu.edu/coca), the British National Corpus (3 July. 2021. Online https://corpus.byu.edu/bnc), and the Corpus of Historical American English (3 July. 2021. Online. https://corpus.byu.edu/coha). With respect to the COCA (3 July. 2021. Online https://corpus.byu.edu/coca), it is interesting to note that May as well go is the most preferred by Americans, followed by May as well get, and May as well give, in descending order. It is also interesting to point out that Might as well get is the most preferred by Americans, followed by Might as well go, and Might as well make, in descending order. With respect to the BNC (3 July. 2021. Online https://corpus.byu.edu/bnc), it is noteworthy that the collocation May as well go is the most preferred by the British, followed by May as well tell (May as well get), May as well make, and May as well use, in descending order. It is also worth noting that Might as well go is the most preferable among the British, followed by Might as well get, Might as well make (Might as well take), and Might as well give, in descending order. Finally, this paper argues that Might as well is preferred over May as well by both Americans and the British and that the former is slightly different from May as well in its use.
- Research Article
31
- 10.5755/j01.sal.40.1.30644
- Jul 13, 2022
- Studies about Languages
The paper represents a distinctive attempt to trace the development of the preposition and the adverb ‘on’ as the initial and transposed categories. The study focuses on their evolution throughout 16 historical time spans – since 850 and up to the present time. The research is based on 7 954 Old English, 2 368 Middle English, 4 251 Early Modern English examples, which have been obtained from the Helsinki Corpus of English Texts and analyzed without applying any corpus software; 174 581 examples of Late Modern English from the Corpus of Late Modern English Texts, which have been processed by means of the Lancsbox software tool; and the statistical data on 7 118 454 examples of Present-Day English retrieved from the Corpus of Historical American English and the British National Corpus. The paper attests that ‘on’ is formed at the first stage (before 850) of the Old English period as the preposition and at the next stage (850–950) is transposed into the category of the adverb, which is characterized by a further slight increase in the statistics and stabilization of its correlation with the preposition ‘on’. Correlation between the categories had remained stable up to the Early Modern English period, when the category of the adverb has started its sustainable growth, which is currently being observed in the English language. The paper proves that in Early Modern English the process of functional transposition is superseded by an utterly new stage of lexicalization which leads to formation of phrasal verbs.
- Research Article
- 10.31332/lkw.v7i2.3166
- Dec 30, 2021
- Langkawi: Journal of The Association for Arabic and English
Try and V construction is prevalent in British and American English. This construction is found in both spoken and written English, although with different frequencies. The verb in this construction only appears in in the base form. The lack of research on this verb formation leaves many aspects unexplored, one of which is the transitivity of the verb. Therefore, this study is intended to find out the number of arguments informed by this construction by matching the number of arguments to the verb try and the verb following it after the conjunction and. Two verbs were used to test this match, i.e., give and bring, which are three-place predicate verbs, and other two two-place predicate verbs, i.e., see and answer, were used to validate the finding. British National Corpus (BNC) and Corpus of Contemporary American English (COCA) were used to collect the data. The findings show that the number of arguments matched the verb following the conjunction and. Therefore, it can be concluded the number of arguments in try and V construction is not unique to this construction, but it is similar to the try to V, where V is the non-finite verb which selects the number of arguments. This result suggests that try and V construction needs to be included in English grammar textbooks in order that non-native speakers can use and understand this rare grammatical rule in appropriate contexts.
- Research Article
2
- 10.1080/09296174.2015.1037160
- Jul 3, 2015
- Journal of Quantitative Linguistics
Traditional grammarians generally hold that English absolute clauses are formal and infrequent. This article is intended to carry out a corpus-based quantitative research on the genre and diachronic distributions of English absolute clauses. We hypothesize that the distribution of absolute clauses in English is significantly different across genres and the diachronic distribution of each function type of absolute clauses in different genres is homogeneous. The British National Corpus (BNC)-based genre distribution research shows that absolute clauses are not frequently used in both the informal spoken texts and the formal academic texts; rather they are mostly used in the narrative texts of fiction. The Corpus of Historical American English (COHA)-based research shows that over the span of 200 years, the total number of absolute clauses tends to increase but not decrease. This is especially true to absolute clauses of attendant circumstances. Although the number of absolute clauses of clausal adjuncts is decreasing, absolute clauses are by no means disappearing but levelling off during the recent five decades, for absolute clauses of clausal adjuncts have been becoming stereotyped expressions both grammatically and semantically.
- Research Article
9
- 10.1177/0963947019865445
- Jul 27, 2019
- Language and Literature: International Journal of Stylistics
A novel distinction is proposed between two types of closed similes: the standard and the non-standard. While the standard simile presents a ground that is a salient feature of the source term (e.g. meek as a lamb), the non-standard simile somewhat enigmatically supplies a non-salient ground (e.g. meek as milk). The latter thus violates a deep-seated norm of similes and presents interpreters with unexpected difficulty, whereby the concept set up to be an exemplar of a quality is actually less than ideal to fulfil this role. The main question addressed here is how these two simile types are relatively distributed across poetic and non-poetic corpora. We elaborate the criteria for what constitutes the non-standard simile, including separating it out from adjacent phenomena like the ironic simile (e.g. brave as a mouse), and go on to explain our operational criteria for salience. Then, we report culling 329 closed similes from an anthology of poetry and 350 closed similes from two corpora of non-poetic discourse, the Corpus of Historical American English and the British National Corpus. An independent judge rated the salience of each ground-and-source pair of each of the similes, presented in randomized order. Results show that while the standard simile is found in both types of discourse, the non-standard kind is only marginally present in the non-poetic corpora but makes up over 40% of the similes in the poetic corpus. We conclude by discussing the implications of these results for theories of poetic language and literariness.
- Book Chapter
- 10.46630/jkaj.2022.24
- May 10, 2022
Dysphemisms, expressions motivated by hatred, contempt, fear, or envy, appear when a neutrally or positively keyed expression is deliberately replaced with another with negative associations. The use of dysphemisms in mass media largely creates an image of society and social life. This language, being short, sharp and clear, adapted to and suitable for readership with diverse social status and sensibility, should not include dysphemisms for their negative character, although we infrequently come across them. We have presumed dysphemisms to be used in every kind of newspaper, at a different level and frequency. The research is based on identification, classification and analysis of dysphemisms used in British newspapers (The broadsheet papers - The Daily Telegraph, The Guardian and The Times, and the tabloids - The Sun, The Mirror and The Daily Mail). In order to show their frequency in everyday discourse, the examples found in the media have been cross-checked against the native language corpora – British National Corpus (BNC) and Corpus of Contemporary American English (COCA). The results show that all processed newspapers and magazines contain dysphemisms, depending on the type and format. Low quality tabloids and sensationalist press use them more frequently (with a higher level of offence) than the informative press with better quality content.
- Research Article
- 10.24256/foster-jelt.v2i4.55
- Sep 29, 2021
- FOSTER: Journal of English Language Teaching
The study aims at exploring the usage of the Phrasal Verbs in the lower secondary Bangladeshi ELT textbooks which are prescribed by the Government of Bangladesh at national education levels. The methodological approach continues in this study based on the corpus tools and related analysis on the topic of Phrasal Verbs usage in those textbooks. The phrasal verbs that are found in the textbooks are extracted from the textbooks; their frequency distributions are analyzed, and finally checked with the two most authentic corpus of the English Language - The British National Corpus (BNC) and Corpus of Contemporary American English (COCA). Alongside, deriving the top fifteen Phrasal Verbs in the textbook corpus with their significant values; the relative positions to the two reference corpora and other corpus-related scores of the phrasal verbs are compared, regarding the study of Liu (2011). The results find that the Phrasal Verbs used in the selected textbooks are quite irrelevant to the two standard big corpora. Finally, based on the findings some remarks and implications are prescribed for pedagogical purposes. The current study would be the first one that examines the Bangladeshi Lower Secondary ELT textbooks assisted by the corpus approach.
- Research Article
243
- 10.3366/cor.2012.0024
- Nov 1, 2012
- Corpora
The Corpus of Historical American English (COHA) contains 400 million words in more than 100,000 texts which date from the 1810s to the 2000s. The corpus contains texts from fiction, popular magazines, newspapers and non-fiction books, and is balanced by genre from decade to decade. It has been carefully lemmatised and tagged for part-of-speech, and uses the same architecture as the Corpus of Contemporary American English (COCA), BYU-BNC, the TIME Corpus and other corpora. COHA allows for a wide range of research on changes in lexis, morphology, syntax, semantics, and American culture and society (as viewed through language change), in ways that are probably not possible with any text archive (e.g., Google Books) or any other corpus of historical American English.
- Research Article
16
- 10.3366/corp.2012.0024
- Nov 1, 2012
- Corpora
The Corpus of Historical American English (COHA) contains 400 million words in more than 100,000 texts which date from the 1810s to the 2000s. The corpus contains texts from fiction, popular magazines, newspapers and non-fiction books, and is balanced by genre from decade to decade. It has been carefully lemmatised and tagged for part-of-speech, and uses the same architecture as the Corpus of Contemporary American English (COCA), BYU-BNC, the TIME Corpus and other corpora. COHA allows for a wide range of research on changes in lexis, morphology, syntax, semantics, and American culture and society (as viewed through language change), in ways that are probably not possible with any text archive (e.g., Google Books) or any other corpus of historical American English.
- Research Article
- 10.22158/selt.v9n4p21
- Aug 20, 2021
- Studies in English Language Teaching
The goal of this paper is to provide an in-depth analysis of the frequency of I was used to, I got used to, and I became used to in the Corpus of Contemporary American English and the British National Corpus. The COCA clearly shows that I was used to may be the most preferable one for Americans, followed by I got used to, and I became used to, in that order. When it comes to the genre frequency of the COCA, it is interesting to note that in the fiction genre, I was used to may be the most commonly used one. The BNC clearly indicates, on the other hand, that I was used to may be the most preferred by British people, followed by I got used to, and I became used to. With respect to the genre frequency of the BNC, it is interesting to note that in the fiction genre, I was used to may be the most widely used. When it comes to the frequency of was used to and nouns, the expression was used to measure is the most preferable one for Americans, followed by was used to people, was used to rate, was used to power, was used to fuel, was used to group, and was used to film. With respect to the frequency of was used to and gerunds, the expression was used to being is the most preferable one for Americans, followed by was used to seeing, was used to having, was used to getting, was used to doing, was used to doing, was used to working, was used to hearing, and was used to going, in that order. Additionally, the COCA shows that got used to life and got used to things are the most preferred ones in America, followed by got used to people, and got used to weapons (got used to walking, got used to violence, got used to name calling), in that order. The COCA also indicates that got used to being is the most preferable one for Americans, followed by got used to seeing, got used to having, got used to hearing, got used to wearing, got used to living, and got used to using (got used to doing). The COCA further shows that became used to seeing is the most preferred by Americans and followed by became used to writing (became used to tying, became used to talking).
- Research Article
- 10.30564/fls.v6i6.7378
- Dec 10, 2024
- Forum for Linguistic Studies
The differences that exist among near-synonyms seem to be a thorny issue for native and non-native speakers of English. This study aims to highlight the similarities and differences between four near-synonymous verbs: investigate, explore, scrutinize, and examine, with a focus on their dialectal variations, frequencies, genre distributions, and colligational patterns. Data were gathered from the Corpus of Contemporary American English (COCA) and the British National Corpus (BNC). The findings reveal that while these verbs are often considered near-synonyms, they are not fully interchangeable across contexts. Explore and examine have scored the highest frequencies across both corpora, especially in academic genres in American English. In contrast, British English exhibits more variation, with investigate and explore appearing more frequently in non-academic texts. Conversely, scrutinize has scored the lowest in both dialects and is primarily confined to academic contexts. Additionally, these verbs are seldom found in spoken genres. The analysis of colligational behavior (i.e., grammatical patterns) demonstrates that these verbs share many grammatical patterns, though subtle differences in their usage prevent complete interchangeability. The COCA provides a wider range of grammatical patterns than those in the BNC. These findings underscore the complexity of near-synonymous verbs and the importance of context in their usage.