Comparing Corpus-Driven and Corpus-Based Approaches to Diachronic Variation
Focusing on grammatical changes in Late Modern and Present-Day English, the author applies a corpus-driven method to texts from two diachronic corpora, the Representative Corpus of Historical English Registers (ARCHER) and the Corpus of Historical American English (COHA). He compares his findings to those returned by more conventional corpus-based methods, which can be characterized as hypothesis-driven. To this purpose, the study employs automated profiling of large feature sets, such as word- and POS-based mono-, bi- and trigrams, chunks, syntactic dependency labels and measures of constituent order and length. The derived feature profiles are combined in a supervised classification task with a given division of texts into earlier and later corpus subperiods to reveal patterns of over- and underuse. Structures that profiled as over- or under-represented in the diachronic subsections are then browsed for grammatical changes that may have been missed by previous research. According to the author, an advantage of such approaches is that they are theory-neutral and may generate novel hypotheses for investigation. These may then serve as input to further corpus-based approaches.
- Book Chapter
2
- 10.1017/cbo9780511981395.006
- Oct 6, 2011
Introduction In this chapter, we turn our attention to the issue of linguistic variation, and how corpora have been employed to study differences in the English language across time and across different contexts of language use. We can interpret variation in a number of different ways. One is change over time or diachronic variation. In the two sections that follow, we will look at the use of corpora to study language change in pre-contemporary and contemporary English, respectively. Yet while corpus-based analysis of language change is a broad field, the study of synchronic variation is even more extensive. In exploring corpus-based approaches to synchronic variation, we will focus on two rather distinct approaches. One approach, touched on briefly in the previous chapter, is strongly associated with Douglas Biber and colleagues; this is the so-called multi-dimensional (MD) approach. The other is associated with variationist socio-linguistics. Although, as we will see, these approaches have certain commonalities, they are distinct in that the MD approach looks at variation across genre (or register), with the individual text as the unit of variation, whereas variationist sociolinguistics looks at variation across class, gender or other social category, with the individual speaker as the unit of variation. We will discuss the MD approach, in particular, at some length, because it is methodologically extremely distinct and statistically sophisticated. Diachronic change from Old English to Modern English Looking at language change is an area of linguistics for which corpus data is particularly appropriate. No one now alive speaks Middle English as a native tongue, much less Old English; thus, even if we wish to rely on the judgements of a native speaker, we simply cannot. Instead, for these and other extinct languages there is a fixed ‘corpus’ of surviving texts which will never grow any further, except in the rare circumstance that hitherto unknown texts are discovered. An electronic corpus composed of all of these surviving texts (or a sampled subset of them) is thus the ideal tool for taking into account as much data on these historical forms as possible in an analysis of how language has changed. The quantitative analyses enabled by corpus methods are also highly valuable for the study of language change. One quite consistent finding of research in historical linguistics is that one structure very rarely replaces another in a single, sudden change. Rather, new structures arise and are initially used infrequently, and then may later increase in frequency of use, perhaps in competition with some established structure (some examples are discussed in the following section). This kind of quantitative pattern is ideally tracked by a corpus sampling texts across time.
- Research Article
- 10.53555/kuey.v30i11.10905
- Jan 1, 2024
- Educational Administration: Theory and Practice
Metadiscourse analysis holds great significance as it provides a way to discover the rhetorical patterns of the text. It is the way in which the language is used by a speaker or writer to regulate the flow of communication, enhance their message, and involve the audience. It is categorized into two main types. Interactive metadiscourse refers to the interaction between the speaker and listener and writer and the reader. Interactive metadiscourse involves devices like, engagement markers (e.g., “you,” “as we can see”), hedges (e.g., “perhaps,” “maybe”) and transitions (e.g., “however,” “in addition”) that help organize ideas and connect concepts. While, interactional metadiscourse show the speaker’s or writer’s stance toward the topic or situation in the content. Hyland has divided interactional metadiscourse into five major categories. They are hedges, boosters, attitude markers, engagement markers, and self-mentions. According to Hyland, metadiscourse is used in language analysis and language education in order to relate the communication of writer with the readers or the speaker with the audience (Hyland, 2005). Hence metadiscourse is a way of understanding the intended communication of the speaker or writer with the listener or reader. According to Hyland, transition markers are mainly conjunctions and adverbs that facilitate the reader in building and understanding the semantic context and meaning of the content. Therefore, the current study employs the metadiscourse framework of Hyland (2005) to investigate the language variation in the academic writing particularly in the three disciplines. This study aims to explore the diachronic variation across doctoral dissertation writing of Pakistani university students in terms of interactional meta-discourse over the last three decades, i.e. from 1990-2020 by examining the prominent textual features and the patterns of change involved in the meta-discourse in question. For this reason, 180 PhD research dissertations were collected from three major disciplines: humanities, social sciences and sciences which finally generated 10 million word corpora. All the metadiscursive devices are analyzed by applying corpus-based approach and then analyzed qualitatively. The results of study show that Pakistani research writers use interactional reach markers to make their writing more persuasive and unified.
- Research Article
- 10.1075/alal.23005.uns
- Jul 5, 2024
- Asian Languages and Linguistics
This paper aims to identify what archaic words/word groups were still known and used both among language speakers and Turkish National Corpus (TNC) as an indication of lexical change in Turkish from 1900 to 2020. The present study explores the diachronic variation of lexical change in Turkish by combining the corpus-based variationist sociolinguistic approach with the perspective of historical sociolinguistics. The words/collocations thought to be outdated from the original version of “Eylül” novel, written in 1900, were selected and randomly subsampled using a computer-based randomization algorithm. A survey was formed using the outdated words/collocations along with the context. The results indicated that demographical variables did not affect word knowledge and that the archaic words were unfamiliar to all participants uniformly. The overall comparison of words/collocations tested in TNC and survey indicated similar results as the most and the least frequently used words were also the most and least abundantly present in TNC.