Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora

Alexander König,Jennifer-Carmen Frey,Egon W Stemle

doi:10.3390/info12050199

Abstract

Up until today research in various educational and linguistic domains such as learner corpus research, writing research, or second language acquisition has produced a substantial amount of research data in the form of L1 and L2 learner corpora. However, the multitude of individual solutions combined with domain-inherent obstacles in data sharing have so far hampered comparability, reusability and reproducibility of data and research results. In this article, we present work in creating a digital infrastructure for L1 and L2 learner corpora and populating it with data collected in the past. We embed our infrastructure efforts in the broader field of infrastructures for scientific research, drawing from technical solutions and frameworks from research data management, among which the FAIR guiding principles for data stewardship. We share our experiences from integrating some L1 and L2 learner corpora from concluded projects into the infrastructure while trying to ensure compliance with the FAIR principles and the standards we established for reproducibility, discussing how far research data that has been collected in the past can be made comparable, reusable and reproducible. Our results show that some basic needs for providing comparable and reusable data are covered by existing general infrastructure solutions and can be exploited for domain-specific infrastructures such as the one presented in this article. Other aspects need genuinely domain-driven approaches. The solutions found for the corpora in the presented infrastructure can only be a preliminary attempt, and further community involvement would be needed to provide templates and models acknowledged and promoted by the community. Furthermore, forward-looking data management would be needed starting from the beginning of new corpus creation projects to ensure that all requirements for FAIR data can be met.

Highlights

Various fields in educational research and applied linguistics work with language data produced by writers or speakers who are still acquiring language competence in the language or language variety they use to express themselves
This regards language produced by non-native speakers, i.e., language learners, where fields such as second language acquisition, learner corpus research, and educational research in language learning and teaching have a long-standing tradition
The research community as a whole widely agrees that data produced during scientific research is a very valuable resource, and making it available following the FAIR principles should be seen as the ideal towards which all researchers should strive within their projects

Summary

Introduction

Various fields in educational research and applied linguistics work with language data produced by writers or speakers who are still acquiring language competence in the language or language variety they use to express themselves. Most this regards language produced by non-native speakers, i.e., language learners, where fields such as second language acquisition, learner corpus research, and educational research in language learning and teaching have a long-standing tradition. Corpora depicting language production by non-native speakers in a second (or third) language (L2) are typically called learner corpora, with an entire research field, learner corpus research [1], specializing on the use of these resources to investigate the dynamics and outcomes of language learning processes on empirical data (cf [2]). Available tools tailored explicitly for the use in learner corpus research, such as EXMARaLDA [3], the Sketch Engine for Language

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Apr 30, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Learner corpus research in Hong Kong: past, present and future
Kanglong Liu ... Nan Zhao
Corpora | VOL. 17
Kanglong Liu, et. al.Kanglong Liu ... Nan Zhao
01 Oct 2022
Corpora | VOL. 17

Research Data Management Challenges in Citizen Science Projects and Recommendations for Library Support Services. A Scoping Review and Case Study
Jitka Stilund Hansen ... Søren Møller
Data Science Journal | VOL. 20
Jitka Stilund Hansen, et. al.Jitka Stilund Hansen ... Søren Møller
18 Aug 2021
Data Science Journal | VOL. 20

Establishing Reliable Research Data Management by Integrating Measurement Devices Utilizing Intelligent Digital Twins.
Joel Lehmann ... Julian Reichwald
Sensors | VOL. 23
Joel Lehmann, et. al.Joel Lehmann ... Julian Reichwald
01 Jan 2023
Sensors | VOL. 23

Learner English on Computer
Sylviane Granger
-
Sylviane GrangerSylviane Granger
04 Feb 2014
04 Feb 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring Reusability and Reproducibility for a Research Infrastructure for L1 and L2 Learner Corpora

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information