Making the Web More Useful as a Source for Linguistic Corpora

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Making the Web More Useful as a Source for Linguistic Corpora

Similar Papers
  • PDF Download Icon
  • Research Article
  • 10.24256/foster-jelt.v2i4.55
Exploration of Phrasal Verbs in ELT Textbooks: A Corpus-Based Analysis of Lower Secondary Level Bangladeshi Books
  • Sep 29, 2021
  • FOSTER: Journal of English Language Teaching
  • Naywaz Sharif Shubha

The study aims at exploring the usage of the Phrasal Verbs in the lower secondary Bangladeshi ELT textbooks which are prescribed by the Government of Bangladesh at national education levels. The methodological approach continues in this study based on the corpus tools and related analysis on the topic of Phrasal Verbs usage in those textbooks. The phrasal verbs that are found in the textbooks are extracted from the textbooks; their frequency distributions are analyzed, and finally checked with the two most authentic corpus of the English Language - The British National Corpus (BNC) and Corpus of Contemporary American English (COCA). Alongside, deriving the top fifteen Phrasal Verbs in the textbook corpus with their significant values; the relative positions to the two reference corpora and other corpus-related scores of the phrasal verbs are compared, regarding the study of Liu (2011). The results find that the Phrasal Verbs used in the selected textbooks are quite irrelevant to the two standard big corpora. Finally, based on the findings some remarks and implications are prescribed for pedagogical purposes. The current study would be the first one that examines the Bangladeshi Lower Secondary ELT textbooks assisted by the corpus approach.

  • Book Chapter
  • 10.46630/jkaj.2022.24
DYSPHEMISMS IN BRITISH PRINT MEDIA – INEVITABLE NEGATIVE ALTERNATIVE OR TREND?
  • May 10, 2022
  • Admir Gorčević

Dysphemisms, expressions motivated by hatred, contempt, fear, or envy, appear when a neutrally or positively keyed expression is deliberately replaced with another with negative associations. The use of dysphemisms in mass media largely creates an image of society and social life. This language, being short, sharp and clear, adapted to and suitable for readership with diverse social status and sensibility, should not include dysphemisms for their negative character, although we infrequently come across them. We have presumed dysphemisms to be used in every kind of newspaper, at a different level and frequency. The research is based on identification, classification and analysis of dysphemisms used in British newspapers (The broadsheet papers - The Daily Telegraph, The Guardian and The Times, and the tabloids - The Sun, The Mirror and The Daily Mail). In order to show their frequency in everyday discourse, the examples found in the media have been cross-checked against the native language corpora – British National Corpus (BNC) and Corpus of Contemporary American English (COCA). The results show that all processed newspapers and magazines contain dysphemisms, depending on the type and format. Low quality tabloids and sensationalist press use them more frequently (with a higher level of offence) than the informative press with better quality content.

  • Research Article
  • 10.31332/lkw.v7i2.3166
Transitivity of Try and V Construction in British and American English
  • Dec 30, 2021
  • Langkawi: Journal of The Association for Arabic and English
  • Faisal Mustafa + 1 more

Try and V construction is prevalent in British and American English. This construction is found in both spoken and written English, although with different frequencies. The verb in this construction only appears in in the base form. The lack of research on this verb formation leaves many aspects unexplored, one of which is the transitivity of the verb. Therefore, this study is intended to find out the number of arguments informed by this construction by matching the number of arguments to the verb try and the verb following it after the conjunction and. Two verbs were used to test this match, i.e., give and bring, which are three-place predicate verbs, and other two two-place predicate verbs, i.e., see and answer, were used to validate the finding. British National Corpus (BNC) and Corpus of Contemporary American English (COCA) were used to collect the data. The findings show that the number of arguments matched the verb following the conjunction and. Therefore, it can be concluded the number of arguments in try and V construction is not unique to this construction, but it is similar to the try to V, where V is the non-finite verb which selects the number of arguments. This result suggests that try and V construction needs to be included in English grammar textbooks in order that non-native speakers can use and understand this rare grammatical rule in appropriate contexts.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.5539/ijel.v8n7p59
Modal Verbs Hedging: The Uses and Functions of “Will” and “Shall” in Nigerian Legal Discourse
  • Nov 27, 2018
  • International Journal of English Linguistics
  • Ibrahim Bashir + 2 more

This is a corpus-based study on the uses and functions of modal verbs “will” and “shall” in the Nigerian legal discourse. It aims at examining their pragmatic functions as hedges in the legal discourse. It specifically aims to investigate how hedges are used in the legal texts to indicate precision and uncertainty. To achieve these objectives a specialised corpus was constructed which we named as “Nigerian Law Corpus” (NLC). The compilation of NLC is based on the Nigerian court proceedings and law reports. Hence, the compiled NLC corpus contains 546,313-word tokens. Meanwhile, reference corpus of law with 2.2 million word tokens based on the British National Corpus (BNC) is retrieved for comparison with NLC. To this end, two concordance tools were utilised to analyse the data of this study viz. “AntConc version 3.5” a semi-automated computer-aided tool and a web-based tool “Lextutor version 7”. Based on the frequency distribution the results revealed that model verb “will” featured in 493 instances in the NLC and 7,711 instances in the BNC Law, while, “shall” occurred at 401 instances in NLC and 1,348 instances in BNC Law. The results also indicated that “shall” was an overused element in NLC than in BNC Law with standardised concordance hits per million (NLC=734, BNC Law =589) while, “will” is the least used element of NLC (902 instances per million) compared to BNC Law (3,369 instances per million). The study also enumerated different semantic and pragmatic functions of “will” and “shall” in legal discourse, citing examples from both tag corpus (NLC) and reference corpus (BNC Law). Some of the functions as hedges (conveying a truth value of a proposition) are epistemic meanings: politeness, obligation, precision, duty, intention, and permission. In nutshell, the results indicated that “will” and “shall” are used by legal practitioners more especially lawyers in a courtroom to achieve precision in their argument in a case to persuade the court by showing the true value of commitment of the proposition.

  • Research Article
  • 10.30564/fls.v7i5.9432
Show Up and Turn Up in American English and British English: A Multi-Method Analysis Using COCA and BNC
  • May 16, 2025
  • Forum for Linguistic Studies
  • Namkil Kang + 1 more

This article investigates the relationship between the terms show up and turn up in the Corpus of Contemporary American English (COCA) and the British National Corpus (BNC). In the COCA, the terms exhibit a 25% similarity in ranking, while in the BNC, this similarity is 14.28%. In terms of genre-specific usage, show up and turn up are most distant in the fiction genre in the COCA, while in the BNC, they are furthest apart in the spoken genre. The closest similarity occurs in the TV/movie genre for the COCA and in the magazine genre for the BNC. Frequency analysis reveals significant national variation. In American English, show up shows greater fluctuation from the mean, while turn up displays more consistent frequency. In contrast, British English usage demonstrates a more stable frequency for show up, while turn up exhibits more erratic variation. The standard deviations for show up in the COCA (1,134) and the BNC (18) further highlight this disparity, as turn up frequencies in both corpora show opposite trends. Statistically, the COCA reveals a strong positive correlation (r = 0.7375) between the two terms, suggesting a significant relationship in American English. However, the BNC’s correlation coefficient (r = 0.0669) indicates no meaningful connection between the terms in British English. This comparison underscores notable national variations in the usage and relationship between show up and turn up across the two varieties of English.

  • Research Article
  • Cite Count Icon 3
  • 10.3366/e174950320900029x
The making of a BNC customised spoken corpus for comparative purposes
  • Nov 1, 2009
  • Corpora
  • Phoenix Lam

Although the British National Corpus (BNC) is one of the most popular resources for linguistic research, only a small number of studies generate results from user-selected texts of the corpus, despite the fact that the BNC is designed with the intention of letting researchers choose specific texts to build their own sub-corpora ( Burnage and Dunlop, 1993 ). This paper reports on the process of building a tailor-made spoken corpus out of the BNC for comparative purposes and attempts to address the practical issues involved in selecting specific texts from the BNC spoken sub-section for one's individual needs. It would, thus, be of particular interest and relevance to those who wish to use (parts of) existing corpora for contrastive research. It is hoped that the issues raised in this paper could serve as a note of caution for future sub-corpus compilation from existing corpora, especially with regard to the BNC.

  • Research Article
  • 10.32782/2710-4656/2024.5.1/15
UNDERSTANDING COLLOCATIONS IN ENGLISH: A STUDY OF WEATHER-RELATED WORDS (BASED ON THE BRITISH NATIONAL CORPUS)
  • Jan 1, 2024
  • "Scientific notes of V. I. Vernadsky Taurida National University", Series: "Philology. Journalism"
  • N I Bondarchuk + 1 more

. . ( "BRITISH NATIONAL CORPUS") , , , . , "British National Corpus" - . , ( "British National Corpus"). , , , , . , , , , . , . , . . . ' , , - . , - , .

  • PDF Download Icon
  • Research Article
  • 10.22158/sll.v5n3p39
A Corpora-based Analysis of You must and You have to
  • Aug 26, 2021
  • Studies in Linguistics and Literature
  • Namkil Kang

The ultimate goal of this paper is to provide an in-depth analysis of the frequency of you must and you have to in the Corpus of Contemporary American English (COCA), the British National Corpus (BNC), and the Corpus of Historical American English (COHA). The COCA clearly shows that you have to may be the preferable one for Americans. When it comes to the genre frequency of you must and you have to, you must is the most frequently used one in the TV/movie genre and you have to is the most commonly used one in the blog genre. The BNC indicates, on the other hand, that you have to may be preferred over you must by British people. The BNC clearly shows that in the fiction genre, you must is the most widely used one, whereas in the spoken genre, you have to is the most frequently used one. This paper argues that the expression you must know is the most preferred by Americans, followed by you must go, you must understand, you must think, and you must take, in that order. This paper further argues that the expression you have to go is the most preferred one in America, followed by you have to get, you have to say, you have to make, and you have to take, in that order. Additionally, the BNC shows that the expression you must know is the most preferred by British people, followed by you must provide, you must go, you must get, and you must take, in that order. The BNC indicates, on the other hand, that the expression you have to go is the most preferred by British people, followed by you have to pay, you have to get, you have to take, and you have to make, in that order. Finally, the COHA clearly shows that you have to may have been the most preferable one for Americans in 1930, whereas you have to may have been the most preferable one for Americans in 2000.

  • Research Article
  • 10.1111/j.1749-818x.2008.00106.x
Teaching & Learning Guide for: Corpus Linguistics in the UK: Resources for Sociolinguistic Research
  • Jan 1, 2009
  • Language and Linguistics Compass
  • Wendy Anderson

Linguistics has drawn on the large quantities of authentic data contained in language corpora for several decades now. While debates continue regarding the nature and interpretation of such data, it is generally accepted that corpus methodologies offer a valuable perspective on language, one that complements the introspective and elicited data used in different sub-fields of linguistics. Increasingly, language corpora can be searched or downloaded over the Internet, and are now therefore very readily accessible. Many also include demographic or textual metadata that make them invaluable as data for sociolinguistics. While existing corpora may have some drawbacks (e.g. where the corpus design is not ideally suited to the study in hand, or available corpora do not have appropriate mark-up), they offer great savings in time and effort compared to creating a new corpus. Moreover, especially given the increasing availability of spoken texts in corpora, they constitute excellent resources for students of different levels, for teachers looking for a quick way to demonstrate a feature of language, and for researchers testing linguistic hypotheses.

  • Single Book
  • Cite Count Icon 8
  • 10.1515/9780748628889
The BNC Handbook
  • Mar 31, 2020
  • Guy Aston + 1 more

This textbook is designed to provide a detailed understanding of the principles and practices underlying the use of large language corpora in exploratory learning and English language teaching and research. It focuses on the largest and most representative corpus of spoken and written data yet compiled - the British National Corpus - and on the search tool SARA (SGML Aware Retrieval Application). The method adopted is to provide a graded series of exercises, each introducing at the same time new features of the software and new techniques or applications for computer-assisted language learning. The book also includes an overview of previous work in corpus linguistics, a bibliography, and a reference manual for the SARA software.* Graded self-paced tutorials* Suggestions for further work* Thorough coverage of corpus linguistics theories and practices* State-of-the-art software* Accessible non-specialist style

  • Research Article
  • 10.36232/interactionjournal.v11i2.51
The Implementation and Students Perceptions of Corpora Utilized in Teaching and Learning Agreement and Disagreement Expressions
  • Oct 11, 2024
  • INTERACTION: Jurnal Pendidikan Bahasa
  • Dian Evaliani + 2 more

This study explored using English Corpora as a media in teaching and learning process. This study aims to know how the British National Corpus is used as a media in the teaching and learning process and how students' perceptions of it. This research was conducted at Junior High School in Karawang. The researcher used a case study approach with a questionnaire to collect the data. The data were processed in the descriptive, descriptive, and table. Based on the findings, using English corpora, especially British National Corpus, is worthwhile as a media in teaching and learning agreement by providing students with the actual usage of agreement and disagreement and examples of agreement and disagreement expressions. This study also shows that using British National Corpus can engage students in expanding their examples of agreement and disagreement expressions. The questionnaires also showed that after used the British National Corpus, students were satisfied, BNC increased students’ autonomous learning, and BNC helped them learn agreement and disagreement expressions by increase their comprehension.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1163/9789401209748_011
Let’s tak a guid lang luik at SCOTS: A Corpus-based Comparison of Light Verb Constructions in SCOTS and the BNC
  • Jan 1, 2013
  • Silke Höche + 1 more

This chapter offers a usage-based description of Light Verb Constructions as represented in the SCOTS corpus. The investigation supports previous findings concerned with British and American English in so far as overarching constructional semantics are shared among varieties. The unhindered incorporation of typically Scots elements into LVCs clearly indicates that the constructions are a fully integrated element of this variety.Keywords: light verb construction, usage-based, dialectal differences, semantic idiosyncrasiesLight Verb Constructions (LVCs) such as have a quick look, take a pleasant stroll, or give a loud roar are complex verbal structures that are used pervasively and are experiencing a steady increase in usage in contemporary English.1 These constructions have been shown to be grammaticalised or conventionalised pairings of form and meaning in that they zoom in on particular facets of event-structure (e.g. agentivity) and serve particular discourse functions (e.g. politeness) when compared to their simple verb counterparts, i.e. look quickly, stroll pleasantly, roar loudly.2Only a few extensive and coherent corpus studies of LVCs have been conducted focussing on standard varieties of English,3 and as a sad sequitur there are virtually no corpus descriptions of usage and formal and semantic idiosyncrasies of LVCs in non-standard varieties of English. The chapter at hand offers a usage-based description of the constructions as represented in the Scottish Corpus of Texts & Speech (SCOTS), contrasting these findings with data retrieved from the British National Corpus (BNC).4 Our investigation will be concerned with the following questions:1. How frequent are complex verbal forms involving have, take, and give in SCOTS and what are their relative proportions?2. What items occur in the flexible, post-verbal slot of the construction and to what semantic classes do they belong?3. What typical Scottish elements are found in the constructions?4. How frequent are modified verbal stems and what modifiers are used to elaborate event-description?These research questions are motivated by several other studies we have conducted, mainly on the basis of two large corpora (the BNC and the Corpus of Contemporary American English) with the aim to present a usage-based portrait of the constructions and to prove or disprove claims found in the literature on these particular multiword units. Such claims have most often been arrived at solely on the basis of intuition or introspection, i.e. methods which may trigger research questions and hypotheses, but by no means allow for a complex analysis of actual linguistic usage, including dialectal variation and comparison of varieties. The study put forward here is a first attempt to compare the usage of the constructions in two different corpora, viz. the BNC and the SCOTS corpus, and to investigate the flexibility of a conventionalised pattern with respect to socio-geographically marked usage (i.e. Scots lexis and morphology). Before we present and discuss the findings of our corpus analysis, we will briefly describe the constructions we are interested in and make a few comments on our research methodology.Taking a quick look at Light Verb ConstructionsBeing so ubiquitous, LVCs have, of course, received considerable attention from linguists with highly diverse theoretical backgrounds. The first successful attempt at a coherent description and systematisation of these constructions (the focus being on have a V) was made by Wierzbicka, whose definition we took as an orientation for our compilation and selection of data.6 Following her description, we think the points below to be of particular relevance for any linguistic discussion of LVCs:1. The constructions consist of the crucial elements: light verb (e.g. have, take, give) + indefinite article a (+ modifying AdjP) + verbal stem.2. As regards the notion Tight verb', this term was originally used by Jespersen, as noted, and is still applied with - at times - varying degrees of conviction or purpose. …

  • Research Article
  • 10.17507/tpls.1501.04
A Comparative Analysis of Think Over and Consider Through BNC, COCA, and ChatGPT
  • Jan 8, 2025
  • Theory and Practice in Language Studies
  • Namkil Kang

This article aims to provide an in-depth comparative analysis of think over and consider through the British National Corpus (BNC), the Corpus of Contemporary American English (COCA) and ChatGPT. It is important to note that consider and think over exhibit identical patterns only in the magazine genre and the miscellaneous genre of the BNC, whereas they share the same pattern only in the newspaper genre of the COCA. This can be taken as confirming evidence that in the BNC, think over and consider are 28.57% the same, whereas in the COCA, they are 14.28% the same. Simply put, think over and consider exhibit a low similarity in the BNC and the COCA. A further point to note is that consider is most similar to think over in the newspaper genre of the BNC, whereas the former is the closest to the latter in the TV/movie genre of the COCA. This, in turn, implies that in the newspaper genre of the BNC and the TV/movie genre of the COCA, think over and consider exhibit the highest degree of similarity. It is also worth noting that the standard deviation of think over and consider clearly shows American speakers’ preferences. Most importantly, 18 of 30 collocations of think over and consider are the same, which suggests that consider and think over share 60% of their collocations.

  • Book Chapter
  • 10.1285/i9788883051531p127
Disambiguating near synonyms in medical discourse: A multilayered corpus analysis of disease, illness and sickness in the British National Corpus
  • Apr 7, 2020
  • Aisberg (University of Bergamo)
  • Stefania Maria Maci + 4 more

This paper discusses the preliminary results of a corpus-based analysis of three basic health-related lexical items: 'disease', 'illness' and 'sickness' on the British National Corpus (henceforth BNC) CQP Web platform (2007 XML). Synonymous at first glance, the terms exhibit a certain degree of co-text and context semantic variation; therefore, the lexical items in question cannot be used interchangeably. This in turn may pose some difficulties in inter-lingual translation and language learning, mainly stemming from the lack of full equivalence (or, in some instances, zero equivalence) between the words and their counterparts in some other languages, such as German or Italian. The paper aims to demonstrate how collocational behaviour and semantic profiles can help disambiguate near synonyms along a cline between general and specialised discourse. To this end, the study employed corpus linguistic methods and analysed the BNC across all its text genres. The collocational patterns of the three selected lexical items were examined in the corpus and the semantic profiles of the lexical items were established. The findings suggest that the three health-related near synonyms exhibit markedly different collocational behaviours and semantic preferences. It is therefore suggested that the approach adopted in this study could be applied to help disambiguate the meanings of near synonyms appearing in any specialised discourse at both intra- and inter-linguistic levels. Future research will compare the findings resulting from a similar investigation to be carried out on COCA to see the extent to which, if any, (a) meanings can vary and (b) whether meaning variations associated with these items depend on the interactants (i.e. professionals/laymen).

  • Research Article
  • 10.37547/ijll/volume03issue05-27
THE STRUCTURE AND CONTENT OF TEXTS OF DIFFERENT GENRES
  • May 1, 2023
  • International Journal Of Literature And Languages
  • Kalimbetov Sharapat Maxsutbayevich + 1 more

The article discusses the problem of studying of the structure and content of texts of different genres. Text is the main structure of language construction at the highest level. Metaphorical expressions of the English language were mainly selected from well-known electronic sources. The main one is, of course, the British National Corps. (British National Corpus). This corpus contains samples of texts of various genres, but poetry speech samples are very rare. Literature Online (LO) was chosen as the second source. This electronic resource contains more than 2000 poems by 797 authors. A total of 554 metaphorical expressions on the subject of Sadness were selected from the "British National Corpus", while 518 expressions were found in Literature Online.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant