- Research Article
- 10.1075/ijlcr.25009.edm
- Jan 15, 2026
- International Journal of Learner Corpus Research
- Amanda Edmonds + 1 more
Abstract This study explores if and how phraseological use patterns change over a five-year period for 14 learners of second-language (L2) Spanish. This period covers an academic year spent in a target-language environment, followed by a four-year attrition period. In addition to documenting potential change in usage patterns, we examine how peak attainment and continued L2 contact during the attrition period influence phraseological competence. The analysis focuses on one type of word combination, namely noun/adjective pairs, and measures change by looking at the frequency of noun/adjective sequences and the strength of the association between the two words. Results point to stability in phraseological competence, with no significant patterns of attrition being uncovered. These findings are interpreted against the backdrop of the small body of research on L2 lexical and, specifically, phraseological attrition, contributing to what is known about long-term learning trajectories in the lexical domain.
- Research Article
- 10.1075/ijlcr.24028.wed
- Nov 4, 2025
- International Journal of Learner Corpus Research
- Helena Wedig + 4 more
Abstract The present study provides a comparative corpus-based analysis of summaries written by three groups: first-language (L1) German writers, second-language (L2) German writers with L1 Dutch, and L2 German writers with other L1s. The aim is to determine whether there are differences in connective use between L1 and L2 writers in summary writing and whether there are L1 Dutch-specific differences. The results show that L2 German writers with non-Dutch L1s use fewer connectives than L1 German writers, whereas L2 German writers with L1 Dutch use more connectives, especially expansion and contingency connectives. In addition, L2 German writers prefer certain connectives (e.g., und (and), weil (because)) and L2 German writers with L1 Dutch aber (but). Overall, this study highlights the importance of (contrastively) analysing summary writing as well as considering under-researched language pairs such as German and Dutch.
- Research Article
- 10.1075/ijlcr.24023.yan
- Oct 7, 2025
- International Journal of Learner Corpus Research
- Linsey C Yang + 3 more
Abstract Natural language processing (NLP) tools, primarily trained on L1 written English, have achieved remarkable performance, but are rarely used in L2 learner data. This study leverages a rule-based segmenter to automatically segment spoken English discourse by both L1 speakers and learners, presenting novel preparatory data-cleaning steps that combine a state-of-the-art disfluency detector and additional rules to improve segmentation performance. In three successive segmentation tests on data from the Louvain Corpus of Native English Conversation (LOCNEC; De Cock, 2004) and the Louvain International Database of Spoken English Interlanguage (LINDSEI; Gilquin et al. 2010), we achieve an enhanced segmentation performance that is similar for both the L1 and L2 data (.84). Our approach highlights the effectiveness of leveraging existing NLP tools to process disfluent L2 spoken transcripts, facilitating automatic discourse analysis in Learner Corpus Research (LCR). The code for executing our pipeline is publicly available for future research.
- Research Article
- 10.1075/ijlcr.24027.pau
- Aug 21, 2025
- International Journal of Learner Corpus Research
- Tobias Pauls
Abstract This report presents the Corpus of Secondary School English as a Foreign Language (EFL) Exams (SEEFLEX). In Germany, upper secondary school EFL exams feature recurring tasks targeting diverse text types. The SEEFLEX was developed to investigate how students complete these tasks linguistically and whether they meet the curricular requirements. The corpus contains data from 575 transcribed authentic curriculum-based examinations (1,979 texts, ~625.000 words). The metadata include standardized receptive vocabulary assessments, a cognition scale, the participants’ reading habits, social background, and their language experience and proficiency. Extensive xml mark-up was added to investigate the influence of inter alia source material, structural text features, and selected language mistakes. An online repository provides full-text access as well as ample additional resources, including an interactive Shiny application to investigate register variation in the corpus.
- Journal Issue
- 10.1075/ijlcr.11.2
- May 15, 2025
- International Journal of Learner Corpus Research
- Research Article
- 10.1075/ijlcr.00054.rev
- May 15, 2025
- International Journal of Learner Corpus Research
- Research Article
1
- 10.1075/ijlcr.24033.mas
- Apr 1, 2025
- International Journal of Learner Corpus Research
- Arianna Masciolini + 29 more
Abstract This paper introduces MultiGEC, a dataset for multilingual Grammatical Error Correction (GEC) in twelve European languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian. MultiGEC distinguishes itself from previous GEC datasets in that it covers several underrepresented languages, which we argue should be included in resources used to train models for Natural Language Processing tasks which, as GEC itself, have implications for Learner Corpus Research and Second Language Acquisition. Aside from multilingualism, the novelty of the MultiGEC dataset is that it consists of full texts — typically learner essays — rather than individual sentences, making it possible to train systems that take a broader context into account. The dataset was built for MultiGEC-2025, the first shared task in multilingual text-level GEC, but it remains accessible after its competitive phase, serving as a resource to train new error correction systems and perform cross-lingual GEC studies.
- Research Article
- 10.1075/ijlcr.00053.har
- Mar 10, 2025
- International Journal of Learner Corpus Research
- Jack A Hardy
- Journal Issue
- 10.1075/ijlcr.11.1
- Feb 3, 2025
- International Journal of Learner Corpus Research
- Research Article
1
- 10.1075/ijlcr.23038.hol
- Jan 13, 2025
- International Journal of Learner Corpus Research
- Christian Holmberg Sjöling
Abstract The present study concerns the effect of lexical complexity on grading of Swedish EFL learners’ texts during high-stakes exams. A learner corpus consisting of 142 texts graded by expert raters and 175 texts graded by teachers was analysed to establish if the latter graded in agreement with the former as intended by the Swedish National Agency for Education (SNAE). Four indices of lexical complexity available in TAALED and TAALES were chosen to explore if this is the case. The method includes conducting ordinal regression with interactions to determine the effect of the independent variables on grade and if these variables have the same effect in texts graded by teachers and expert raters. The findings reveal a discrepancy between expert raters and teachers as they appear to consider lexical complexity to a different extent. It was also found that expert raters and teachers graded more in agreement during source-based writing tasks compared to independent writing tasks.