Abstract

The article describes the experience of creating a corpus-based list of the most relevant multi-word expressions for Russian L2 learners, distributed across the levels of the Common European Framework of Reference for Languages (CEFR) from A1 to C1. Modern linguistic and cognitive research shows that our speech is patterned and largely consists of stable segments. This fact is supported by the linguodidactic idea of teaching not isolated language units but their combinations of different nature. However, the selection and ranking of multi-word expressions based on language proficiency levels is constrained by the difficulty of automatically extracting them from a corpus of texts and estimating their frequency, as well as disagreements in defining the boundaries, linguistic nature, and terminology of multi-word expressions. This article describes the experience of compiling a list of the most valuable fixed-type multi-word expressions from various sources: two types of existing CEFR-graded vocabulary lists for Russian L2 learners – lexical minimums for the TORFL (Test of Russian as a Foreign Language) system and Russian KELLY (KEywords for Language Learning for Young and adults alike); the most frequent n-grams from the RuFoLa – Russian L2 textbook corpus and from the Russian Web corpus of internet texts; list of discourse formulas from the «Pragmaticon» project. The CEFR level of each multi-word expression is predicted using the frequency-based Max Delta measure, and its effectiveness is subsequently validated through annotation by multiple experts. The resulting list of multi-word expressions contains 1645 entries from A1 to C1 levels. The proposed version of the list has been implemented into an automated text analysis system for learners of Russian as a Foreign Language and can be useful for a wide range of professionals in the preparation of educational content for foreign language learners. The suggested Max Delta measure has demonstrated a high degree of agreement with expert evaluations within proficiency levels A1-B1. This signifies the importance of further exploring its potential in addressing related practical tasks and in selecting language learning content derived for other languages.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.