Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Export
Sort by: Relevance
  • Research Article
  • 10.1075/ijlcr.25009.edm
L2 phraseological use during an attrition period
  • Jan 15, 2026
  • International Journal of Learner Corpus Research
  • Amanda Edmonds + 1 more

Abstract This study explores if and how phraseological use patterns change over a five-year period for 14 learners of second-language (L2) Spanish. This period covers an academic year spent in a target-language environment, followed by a four-year attrition period. In addition to documenting potential change in usage patterns, we examine how peak attainment and continued L2 contact during the attrition period influence phraseological competence. The analysis focuses on one type of word combination, namely noun/adjective pairs, and measures change by looking at the frequency of noun/adjective sequences and the strength of the association between the two words. Results point to stability in phraseological competence, with no significant patterns of attrition being uncovered. These findings are interpreted against the backdrop of the small body of research on L2 lexical and, specifically, phraseological attrition, contributing to what is known about long-term learning trajectories in the lexical domain.

  • Open Access Icon
  • Research Article
  • 10.1075/ijlcr.24028.wed
The influence of L1 Dutch on connective use in L2 German academic writing
  • Nov 4, 2025
  • International Journal of Learner Corpus Research
  • Helena Wedig + 4 more

Abstract The present study provides a comparative corpus-based analysis of summaries written by three groups: first-language (L1) German writers, second-language (L2) German writers with L1 Dutch, and L2 German writers with other L1s. The aim is to determine whether there are differences in connective use between L1 and L2 writers in summary writing and whether there are L1 Dutch-specific differences. The results show that L2 German writers with non-Dutch L1s use fewer connectives than L1 German writers, whereas L2 German writers with L1 Dutch use more connectives, especially expansion and contingency connectives. In addition, L2 German writers prefer certain connectives (e.g., und (and), weil (because)) and L2 German writers with L1 Dutch aber (but). Overall, this study highlights the importance of (contrastively) analysing summary writing as well as considering under-researched language pairs such as German and Dutch.

  • Open Access Icon
  • Research Article
  • 10.1075/ijlcr.24023.yan
Automatic discourse segmentation of L1 and L2 spoken English transcripts
  • Oct 7, 2025
  • International Journal of Learner Corpus Research
  • Linsey C Yang + 3 more

Abstract Natural language processing (NLP) tools, primarily trained on L1 written English, have achieved remarkable performance, but are rarely used in L2 learner data. This study leverages a rule-based segmenter to automatically segment spoken English discourse by both L1 speakers and learners, presenting novel preparatory data-cleaning steps that combine a state-of-the-art disfluency detector and additional rules to improve segmentation performance. In three successive segmentation tests on data from the Louvain Corpus of Native English Conversation (LOCNEC; De Cock, 2004) and the Louvain International Database of Spoken English Interlanguage (LINDSEI; Gilquin et al. 2010), we achieve an enhanced segmentation performance that is similar for both the L1 and L2 data (.84). Our approach highlights the effectiveness of leveraging existing NLP tools to process disfluent L2 spoken transcripts, facilitating automatic discourse analysis in Learner Corpus Research (LCR). The code for executing our pipeline is publicly available for future research.

  • Research Article
  • 10.1075/ijlcr.24027.pau
SEEFLEX
  • Aug 21, 2025
  • International Journal of Learner Corpus Research
  • Tobias Pauls

Abstract This report presents the Corpus of Secondary School English as a Foreign Language (EFL) Exams (SEEFLEX). In Germany, upper secondary school EFL exams feature recurring tasks targeting diverse text types. The SEEFLEX was developed to investigate how students complete these tasks linguistically and whether they meet the curricular requirements. The corpus contains data from 575 transcribed authentic curriculum-based examinations (1,979 texts, ~625.000 words). The metadata include standardized receptive vocabulary assessments, a cognition scale, the participants’ reading habits, social background, and their language experience and proficiency. Extensive xml mark-up was added to investigate the influence of inter alia source material, structural text features, and selected language mistakes. An online repository provides full-text access as well as ample additional resources, including an interactive Shiny application to investigate register variation in the corpus.

  • Journal Issue
  • 10.1075/ijlcr.11.2
  • May 15, 2025
  • International Journal of Learner Corpus Research

  • Research Article
  • 10.1075/ijlcr.00054.rev
Referees in 2024
  • May 15, 2025
  • International Journal of Learner Corpus Research

  • Open Access Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1075/ijlcr.24033.mas
Towards better language representation in Natural Language Processing
  • Apr 1, 2025
  • International Journal of Learner Corpus Research
  • Arianna Masciolini + 29 more

Abstract This paper introduces MultiGEC, a dataset for multilingual Grammatical Error Correction (GEC) in twelve European languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian. MultiGEC distinguishes itself from previous GEC datasets in that it covers several underrepresented languages, which we argue should be included in resources used to train models for Natural Language Processing tasks which, as GEC itself, have implications for Learner Corpus Research and Second Language Acquisition. Aside from multilingualism, the novelty of the MultiGEC dataset is that it consists of full texts — typically learner essays — rather than individual sentences, making it possible to train systems that take a broader context into account. The dataset was built for MultiGEC-2025, the first shared task in multilingual text-level GEC, but it remains accessible after its competitive phase, serving as a resource to train new error correction systems and perform cross-lingual GEC studies.

  • Research Article
  • 10.1075/ijlcr.00053.har
Review of Goulart (2024): Variation in University Student Writing: A Communicative Text Type Approach
  • Mar 10, 2025
  • International Journal of Learner Corpus Research
  • Jack A Hardy

  • Journal Issue
  • 10.1075/ijlcr.11.1
Cumulative Knowledge Building in Learner Corpus Research
  • Feb 3, 2025
  • International Journal of Learner Corpus Research

  • Research Article
  • Cite Count Icon 1
  • 10.1075/ijlcr.23038.hol
The effect of lexical complexity on grading of Swedish EFL learners’ texts during high-stakes exams
  • Jan 13, 2025
  • International Journal of Learner Corpus Research
  • Christian Holmberg Sjöling

Abstract The present study concerns the effect of lexical complexity on grading of Swedish EFL learners’ texts during high-stakes exams. A learner corpus consisting of 142 texts graded by expert raters and 175 texts graded by teachers was analysed to establish if the latter graded in agreement with the former as intended by the Swedish National Agency for Education (SNAE). Four indices of lexical complexity available in TAALED and TAALES were chosen to explore if this is the case. The method includes conducting ordinal regression with interactions to determine the effect of the independent variables on grade and if these variables have the same effect in texts graded by teachers and expert raters. The findings reveal a discrepancy between expert raters and teachers as they appear to consider lexical complexity to a different extent. It was also found that expert raters and teachers graded more in agreement during source-based writing tasks compared to independent writing tasks.