How stable are multivariate findings about register variation across varieties of English? On the replicability of Geometric Multivariate Analysis
Abstract Registers reflect the constraints of systematically recurring situational contexts and are therefore embedded in the lingua-culture in which these situations occur. Consequently, when a language – such as English – is used in widely differing cultural contexts, the question arises whether registers in different varieties of the language might not actually reflect cultural differences between similar types of situations. Previous studies have shown that varieties of English fall into different clusters and that informal spoken texts in particular reflect differences between the varieties. With a focus on register variation across varieties of English, Neumann & Evert (2021) suggest that register-related patterns of variation are much more pronounced than differences between varieties. However, they also observe divergence between texts in the same register from different varieties. The generality of both findings is limited, though, because their analysis was based on only three varieties of English. Our paper aims at exploring these questions more thoroughly by drawing on a larger set of nine components of the International Corpus of English (ICE) preprocessed for comparability (Lehmann & Schneider 2012) and by focusing the interpretation on registers that are expected to be more strongly affected by cultural differences. To this end, we extract the same set of 41 lexico-grammatical features from the ICE components as Neumann & Evert (2021), building on the corpus queries made available in their online supplement. In three steps, we first reproduce the geometric multivariate analysis (GMA) of Neumann & Evert (2021) and then replicate it in two increasingly different approaches. These methodological variations allow us to explore to what extent the results of Neumann & Evert (2021) depended on their specific choice of three ICE components and how stable the results of the exploratory analysis with the chosen multivariate approach are.
- Research Article
2
- 10.2478/v10122-012-0004-2
- Oct 1, 2012
- Lingua Posnaniensis
Natalia Budohoska. Characteristic Morphological and Syntactic Features of English inKenya: A Corpus Study (ICE). Lingua Posnaniensis, vol. L IV (1)/2012. The Poznań Society for the Advancement of the Arts and Sciences. PL ISSN 0079-4740, ISBN 978-83-7654-103-7, pp. 45-56. This study discusses characteristic morphological and syntactic features of English in Kenya on the basis of the International Corpus of English (ICE) for Kenya. It contains a list of typical traits compiled following the universal criteria for describing varieties of English set up by Kortmann (2008: xxv-xxix). The features found were confronted with the Szmrecsanyi & Kortmann (2009: 68) concept of the inherent simplification of the new varieties of English. Finally, the amount of variation found in the ICE was placed into a wider context of other postcolonial varieties of English. The results of this analysis add to the discussion of recognizing English in Kenya as an emancipated variety of English (Budohoska 2011a, b). This study presents a high degree of characteristic features of English in Kenya, many of them shared with other recognized varieties of English. It also reveals tendencies of simplification common to New Englishes. The estimated frequencies of these features, however, are too low to reveal stigmatized forms of Kenyan English.
- Research Article
79
- 10.1075/eww.00011.kru
- May 31, 2018
- English World-Wide
Previous research suggests there are register differences between native and non-native varieties of English, as well as translated English. This article reports on a multidimensional (MD) analysis of register variation in the published written registers of 16 varieties of English, and tests expectations for register variation in contact varieties evident from existing research. The study finds that the effects ofvarietyandregisterare largely independent of each other, indicating that overall, registers pattern in similar ways across varieties.registeris the strongest factor accounting for variance in the data, butvarietyalso contributes significantly to variation. Non-native varieties before phase four in the Dynamic Model (Schneider 2007) and translations draw more extensively on markers of formality than non-native varieties at phase four and native varieties. Contact varieties display fewer involvement features than native varieties. Persuasive strategies and reported speech are variable across varieties, suggesting local stylistic and cultural differences.
- Research Article
- 10.1111/j.1749-818x.2008.00106.x
- Jan 1, 2009
- Language and Linguistics Compass
Linguistics has drawn on the large quantities of authentic data contained in language corpora for several decades now. While debates continue regarding the nature and interpretation of such data, it is generally accepted that corpus methodologies offer a valuable perspective on language, one that complements the introspective and elicited data used in different sub-fields of linguistics. Increasingly, language corpora can be searched or downloaded over the Internet, and are now therefore very readily accessible. Many also include demographic or textual metadata that make them invaluable as data for sociolinguistics. While existing corpora may have some drawbacks (e.g. where the corpus design is not ideally suited to the study in hand, or available corpora do not have appropriate mark-up), they offer great savings in time and effort compared to creating a new corpus. Moreover, especially given the increasing availability of spoken texts in corpora, they constitute excellent resources for students of different levels, for teachers looking for a quick way to demonstrate a feature of language, and for researchers testing linguistic hypotheses.
- Research Article
17
- 10.1177/0075424217740938
- Dec 4, 2017
- Journal of English Linguistics
This paper examines voice alternation, that is, variation between the active and passive voice in academic Englishes. The focus is on differences regarding degrees of author involvement. A previous study on the use of be-passives in fifteen varieties of academic English (Hundt, Schneider & Seoane 2016) found voice alternation to be very similar in both contact and native (ENL) varieties of English, with only American English showing a pronounced tendency towards a more frequent use of actives. A more fine-grained analysis, however, revealed highly significant interdisciplinary variation: whereas in the hard sciences the default option to express a transitive event is the passive voice, in the soft sciences, preference is often given to the active. In this paper we do not compare varieties of English but concentrate on ENL data from the entire academic sections of ICE corpora (International Corpus of English) as a whole in order to uncover the functional role of actives and passives across disciplinary areas with regard to authorial presence. The results indicate that the differences attested do not correlate with differences in authorial involvement ( We discovered this versus This was discovered) since texts remain equally impersonal. Other factors, such as the increasing informalization observed in various genres, will have to be contemplated in any comprehensive study of the rhetoric of science.
- Research Article
- 10.1111/weng.12740
- Apr 17, 2025
- World Englishes
For several decades the use of the modal shall has been reported to be inexorably declining, especially in Inner Circle varieties of English. Some authors have even talked of its demise outside legal texts. This article set out to argue that inexorable decline and demise do not apply to the fortunes of shall in three East African varieties of English, namely, Kenyan, Tanzanian and Ugandan English. The literature reports that the decline of shall has benefitted its main competitor, will. Because of this, will is taken as the reference in the statistical models used in the present study to establish the usage of shall over the last three decades in the three East African varieties, with British English, their common colonial ancestor, being taken as the reference in the same statistical analyses. These were based on data drawn from the International Corpus of English and the Corpus of Global Web‐Based English. They involved chi‐square tests of the overall frequencies of occurrence of shall and will and binomial logistic regression tests of models with fixed effects and those with interactions. Statistically significant findings indicate that shall was proportionally more frequent in East African English than in British English, with the main source of difference in usage being the much greater use of shall with third‐person subjects in the three East African varieties. This usage was corroborated even further by qualitative data analysis of a selection of legal texts, newspaper articles and official administrative texts. With all these three types of texts being more recent than the two corpora, this article concludes that even though corpus data do indeed point to a diachronic decline of shall even in East African English, shall is still alive and well in certain registers of it.
- Single Book
150
- 10.1075/scl.44
- May 9, 2011
The articles in this volume are intended to bridge what Sridhar and Sridhar (1986) have called the 'paradigm gap' between traditional SLA research on the one hand and research into institutionalised second-language varieties in former colonial territories on the other. Since both learner Englishes and second-language varieties are typically non-native forms of English that emerge in language contact situations, it is high time that they are described and compared on an empirical basis in order to draw conceptual and theoretical conclusions with regard to their form, function and acquisition. The present collection of articles places special emphasis on empirical evidence obtained from large-scale analyses of computerised corpora of learner Englishes (such as the International Corpus of Learner English) and of second-language varieties of English (such as the International Corpus of English). It addresses questions such as ‘Are the phenomena we find in ESL and EFL varieties features or errors?’ or ‘How common and wide-spread are features across contact varieties of English?’
- Research Article
41
- 10.1111/j.1467-971x.1990.tb00264.x
- Jul 1, 1990
- World Englishes
ABSTRACT:This article derives from the internal discussions of a project that has just been launched and which may provide a useful example of modern comparative linguistics: the International Corpus of English (ICE). It concentrates on the problems which arise when the principles of corpus compilation, which were developed in native communities (ENL corpora) in the pre‐sociolinguistic age, are applied to non‐native communities (ESL corpora) such as Africa. In my opinion this reveals a crucial difficulty in corpus compilation that has been neglected in most corpus‐linguistic work: the contrast and relationship between variation according to use and that according to user, or between stylistic sampling categories based on text types and sociolinguistic ones based on speaker/writer identity. Examples of such problems will be derived from the second‐language corpus I am primarily concerned with, the Corpus of East African English, but the principles of socio‐stylistic variation in native and non‐native varieties of English go far beyond this immediate context. They aim at combining two modern quantitatively oriented linguistic subdisciplines to their mutual benefit. After a brief introduction to the ICE project the following points are dealt with: first, the uses of computer‐readable corpora for modern grammars and dictionaries in general (Section 2) and for applied (Section 3) and theoretical (Section 4) research on non‐native varieties of English in particular, then the text type approach applied in ENL corpora so far (Section 5) and the sociolinguistic dimension with its relationship to stylistic variation (Section 6), followed by practical considerations for Third World Englishes (Section 7), and finally a multidimensional approach to socio‐stylistic variation (Section 8) which may be necessary for transferring the ENL‐based methodology of corpus compilation to ESL varieties.
- Research Article
- 10.5539/ells.v11n3p55
- Aug 1, 2021
- English Language and Literature Studies
Anger as one of the basic emotions has attracted much attention. In the construction of “Anger adjectives + prepositions”, the temporal duration of the Anger adjectives is closely related to their prepositional collocates. Differences in the use of the Anger adjectives and their prepositional collocates might be captured in the world English varieties. The corpora used in this study cover eight varieties of English. The five varieties of English used in Canada, Philippines, Singapore, India and Nigeria are from the International Corpus of English (ICE). The China English corpus (ChiE) consists of news texts crawled from six Chinese English media. American English is taken from the Corpus of Contemporary American English (COCA) and British English is taken from British National Corpus (BNC). By investigating the use of the Anger adjectives and their prepositional collocates in the eight varieties of English, this paper finds that, on the continuums of the temporal duration of Anger adjectives, most varieties of English are closer to American English, whereas only Singapore English is close to British English. The distribution of Anger adjectives in the English varieties is largely in accordance with the Concentric Circles of world Englishes whereas the continuums of the temporal duration of emotions present a new insight into their relations.
- Research Article
- 10.14198/raei.2017.30.04
- Dec 15, 2017
- Revista Alicantina de Estudios Ingleses
The variety of English used in Gibraltar has been in contact with a number of European languages, such as Spanish, Italian, Hebrew and Arabic (Moyer, 1998: 216; Suárez-Gómez, 2012: 1746), for more than 300 years. Studies of this variety have traditionally been based on interviews and observation (e.g. Moyer, 1993, 1998; Cal Varela, 1996; Levey, 2008 2015; Weston, 2011, 2013, etc.), and a detailed morphosyntactic description is yet to be published. In this context, the compilation of a reliable Gibraltar corpus using the standards of the International Corpus of English (ICE) will constitute a landmark in the analysis of this lesser known variety of English. In the present paper we describe the ICE project and the current state of the compilation of ICE-GBR. In addition, we present a detailed comparison between the section on press news reports of ICE-GB (standard British English) and ICE-GBR, with the aim of identifying morphosyntactic features that reveal the influence of language contact with Spanish in this territory. We explore variables such as the choice of relativizer (assuming a higher preference for that in GBR, in agreement with Spanish que, the most frequent relativizer, Brucart, 1999: 490), the use of titles and pseudo-titles preceding proper names (which, as shown by Hundt and Kabatek, 2015, are very frequent in English journalese and extremely infrequent in Spanish), and the frequency of the passive voice (expected to be lower in ICE-GBR), among others. A preliminary analysis of these variables reveals that the influence of Spanish on the variety of English used in the Gibraltarian press, at the morphosyntactic level, is almost non-existent, limited to occasional cases of code-switching between the two varieties. We hypothesize that a possible explanation for this strong exonormative allegiance to British English, at least in press news reports, can be found in a strong editorial pressure to reflect the prestigious parent-variety.
- Research Article
20
- 10.1075/eww.37.1.03sch
- Mar 3, 2016
- English World-Wide
The noun phrase (NP) is at the heart of several studies investigating regional variation in varieties of English. While so far the bulk of research has focused on isolated structural features, the present study is a comparative analysis of NP complexity across varieties of English. NP complexity is compared across five regional varieties and four text categories, based on data from the International Corpus of English. The study adopts a multinomial regression approach, which takes into consideration the interaction of three potential predictors: syntactic function, text type, and variety. The results underline the need for text-type-sensitive studies and add to an understanding of syntactic contact phenomena in varieties of English. More specifically, we find marked differences in the predictive power of the variables and illustrate how focusing on the interaction of syntactic functions, text type and regional variety contributes to a systematic description of variation in the NP in world Englishes.
- Book Chapter
27
- 10.1075/scl.103.06neu
- Nov 9, 2021
This chapter reports an exploration of dimensions of register variation across varieties of English. We analyse 2,844 texts from the Hong Kong, Jamaica and New Zealand components of the International Corpus of English, using its text categorization scheme as a frame of reference. We apply Geometric Multivariate Analysis, an interactive procedure for exploring latent structure in language variation, based on the frequencies of 41 lexico-grammatical features informed by systemic functional register theory. Visual inspection of the distribution of texts across the multidimensional space reveals continuities between groups of texts as well as dimensions of variation that can be related to theoretical register constructs. We also observe differences between the three ICE components (and their text categories) in register space.
- Book Chapter
5
- 10.1093/oso/9780198235828.003.0002
- Aug 1, 1996
Though there is no general agreement on the exact figures, everybody now recognizes that there are now more non-native speakers of English in the world than native speakers. McArthur (1992: 355) speaks of a 2-to-l ratio of non-natives to natives’. In this context, a project such as The International Corpus of English (ICE) is particularly welcome, as in addition to featuring different native varieties of English, it gives non-native varieties of English the place they deserve. How ever, ICE only covers institutionalized varieties of non-native English such as Indian English or Nigerian English. It leaves out a sizeable-arguably the largest group of non-native users of English in the world, i.e. foreign learners of English. It was to do justice to this rapidly expanding group of English speakers that I put forward a proposal to complement ICE with a corpus of learner English, a suggestion which was welcomed by Sidney Greenbaum. This resulted in the launch of The International Corpus of Learner English (ICLE) in late 1990. In this paper I will first situate the corpus within the other non-native varieties of English. Then I will describe the corpus in detail, paying particular attention to issues of meth odology. I will briefly illustrate the insights to be derived from a computer-based investigation of learner lexis, grammar, and discourse features. Finally, I will high light the pedagogical advantages of a corpus-based approach to EFL.
- Book Chapter
- 10.4324/9781003025078-3
- Nov 15, 2021
This chapter examines the use of the so-called were-subjunctive in hypothetical counterfactual conditional clauses in three varieties of English: British, Irish and Indian English. Some recent studies have suggested that the were-subjunctive has declined in frequency in the last half-century or so and is being replaced by indicative was. Another change, possibly connected with the decline of the were-subjunctive, is the increasing use of would in the subordinate clause of conditional sentences. The database of this study consists of the written and spoken components of the International Corpus of English (ICE) corpora from British, Irish and Indian English. Existing studies of American, Australian and some South Asian Englishes are used as further points of comparison. The results show that in British English the distribution between subjunctive were and indicative was is fairly even, although there is a slight majority for the was option. Indian English behaves much like British English, retaining a firm place for the were-subjunctive in its grammar. In Irish English, by contrast, the were-subjunctive is a clear minority choice as compared with British English or Indian English.
- Research Article
- 10.59324/ejtas.2023.1(6).113
- Nov 1, 2023
- European Journal of Theoretical and Applied Sciences
This research is an analysis of the translation equivalents in Nigerian and Ghanaian Englishes. Translation equivalents refer to manifestations of mother tongues interferences in which lexical items are substituted literally from other local languages to English language. This study discusses the data from ICE Nigeria and Ghana respectively that reflect mother tongue interferences. All the data were purposively drawn from International Corpus of English (ICE) Nigeria and (ICE) Ghana components. A total of thirty-nine expressions constitute the data for analysis in this study. An eclectic framework of language interference, transfer and language variation and change is used for analysis. The analyses are in three levels: sociolinguistic, semantic and corpus based. This study identifies some distinctive NE and GhE lexical items from ICE Nigeria and Ghana with their meanings. Examples include “raise voice and no light” (NE) and “feel the rain and kill time” (GhE). The translation equivalents in NE are majorly as a result of the influence of the Nigerian indigenous languages: Igbo, Yoruba and Hausa languages, among others. That of GhE is greatly influenced by the Akan, Ewe and Ga languages. The study reveals that translation equivalents in both varieties of English are quite related.
- Book Chapter
12
- 10.1163/9789042025981_022
- Jan 1, 2009
Discourse markers are a feature of everyday conversation – they signal attitudes and beliefs to their interlocutors beyond the base utterance. One particular type of discourse marker is the invariant tag (InT), for example New Zealand and Canadian eh. Previous studies of InTs have clearly described InT uses in individual language varieties. Such studies have focused on sociolinguistic features and on sociolinguistic functions of single markers. However, InTs as a class have not yet been fully described, and the variety of approaches taken (corpus- as well as survey-based) means that cross-varietal or cross-linguistic comparison cannot be conducted with the results thus far. This study investigates InTs in five varieties of English from a corpus-based approach. It lists the utterance-final InTs available in NZ, British, Indian, Singapore and Hong Kong English through their occurrences in their respective International Corpus of English (ICE) corpora, and compares frequency of usage across the varieties. The quantitative analysis offers a clearer overview of the InT class for descriptive grammars, and clarifies some usage aspects for ESL/EFL pedagogy. Finally, the results offer an insight into the global status of InTs in English.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.