Abstract

The task of recognizing the author’s native language based on a text (Native Language Identification - NLI) is the task of automatically recognizing native language (L1) based on texts written in a language that is not native to the author. The NLI task was studied in detail for the English language, and two shared tasks were conducted in 2013 [1] and 2017 [2], where TOEFL English essays and essay samples were used as data. There is also a small number of works where the NLI problem was solved for other languages, among which Russian has not yet been studied. This paper discusses the use of well-established approaches in the NLI Shared Task 2013 and 2017 competitions to solve the problem of recognizing the author's native language, as well as to recognize the type of speaker — learners of Russian or Heritage Russian speakers. The classifier presented in this paper is based on the support vector machine (SVM) using the TF-IDF metric. This study is data-driven and is possible thanks to the Russian Learner Corpus developed by the HSE Learner Russian Research Group [3] on the basis of which experiments are being conducted.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.