Text Corpus Research Articles

The article describes the experience of creating a corpus-based list of the most relevant multi-word expressions for Russian L2 learners, distributed across the levels of the Common European Framework of Reference for Languages (CEFR) from A1 to C1. Modern linguistic and cognitive research shows that our speech is patterned and largely consists of stable segments. This fact is supported by the linguodidactic idea of teaching not isolated language units but their combinations of different nature. However, the selection and ranking of multi-word expressions based on language proficiency levels is constrained by the difficulty of automatically extracting them from a corpus of texts and estimating their frequency, as well as disagreements in defining the boundaries, linguistic nature, and terminology of multi-word expressions. This article describes the experience of compiling a list of the most valuable fixed-type multi-word expressions from various sources: two types of existing CEFR-graded vocabulary lists for Russian L2 learners – lexical minimums for the TORFL (Test of Russian as a Foreign Language) system and Russian KELLY (KEywords for Language Learning for Young and adults alike); the most frequent n-grams from the RuFoLa – Russian L2 textbook corpus and from the Russian Web corpus of internet texts; list of discourse formulas from the «Pragmaticon» project. The CEFR level of each multi-word expression is predicted using the frequency-based Max Delta measure, and its effectiveness is subsequently validated through annotation by multiple experts. The resulting list of multi-word expressions contains 1645 entries from A1 to C1 levels. The proposed version of the list has been implemented into an automated text analysis system for learners of Russian as a Foreign Language and can be useful for a wide range of professionals in the preparation of educational content for foreign language learners. The suggested Max Delta measure has demonstrated a high degree of agreement with expert evaluations within proficiency levels A1-B1. This signifies the importance of further exploring its potential in addressing related practical tasks and in selecting language learning content derived for other languages.

Read full abstract

The need to support science, technology, engineering, and mathematics (STEM) learning in secondary education is reflected in the ongoing investigation of innovative pedagogical practices, including game-based learning (GBL). Using an analysis of scholarly publications based on word co-occurrence, this study aimed to identify the main research themes addressed in the past decade by the scholarly community on game-based teaching and learning solutions in the context of STEM education in secondary schools, their evolution over time, and the key issues addressed in recent years. After a systematic selection, the titles and abstracts of the publications were collected in a text corpus and analyzed using T-LAB software version 7.2.1.4 (2022). A preliminary visual exploration of the keywords was performed to obtain an overall view of the issues addressed by the research. Specificity analysis was then applied to identify, for each subset of the corpus identified by the years of publication, the evolution of themes reflected in a change in the frequency of lemma use. Finally, to explore the most recent topics, the main thematic clusters of publications in the last three years were identified (thematic analysis of elementary contexts). The results suggest some changes in the issues addressed over the past decade, such as a shift in focus from the specific technologies and competitive elements of games to understanding how GBL can support engagement, motivation, and understanding of complex scientific concepts. The five key thematic clusters identified (“Experience”, “Application”, “Validation”, “Emotion”, and “Programming”) also indicate a stronger emphasis by the latest publications on the experiential and emotional components of learning, the need for empirical studies, and the integration of computational thinking and coding into GBL. Overall, this study indicates that GBL has the potential to become an integrated component of STEM education, evolving with pedagogical and technological innovations.

Read full abstract

Text Corpus Research Articles

Related Topics

Articles published on Text Corpus

Research on the Training and Application Methods of a Lightweight Agricultural Domain-Specific Large Language Model Supporting Mandarin Chinese and Uyghur

Multi-word expressions for Russian L2 learners: corpora-based selection with expert verification

“... but be sure you let it settle”: Late Modern Authors’ Presence in English Scientific Texts

Where and how machine learning plays a role in climate finance research

The Words for “Blue” in Old Frisian

What do Contemporary Publications Report about the Generation of Urban Solid Waste (MSW) and/or Consumption from the Perspective of Chemistry Teaching?

Fraseología dialectal en la obra literaria de B. Pérez Galdós

Фонетические процессы в сочетаниях согласных (на материале осетинского языка)

PERCEPTION OF SALVADOR DALí’S CREATION AND ITS INTERPRETATION IN THE TEXTS OF THE 20TH-21ST CENTURIES

Navigating the Evolution of Game-Based Educational Approaches in Secondary STEM Education: A Decade of Innovations and Challenges

Empirical application of sentiment analysis and emotions in Spanish: A post-cognitivist approach

Defending the “Backward Civilization”: The Resurrection of a Forgotten 17th Century Text in 20th Century Intellectual Discourse on Islam

Social representations of health-disease-care, security, and food sovereignty of indigenous students from Insikiran

Wybory translatorskie XIX-wiecznego słownikarza rosyjskiego jako impuls do badań chronologizacyjnych polskiej leksyki zapożyczonej

DAVOMLI HOZIRGI ZAMON GRAMMATIK SHAKLLARI ANNOTATSIYASI

A materials terminology knowledge graph automatically constructed from text corpus

Challenges and prospects for teaching ocean literacy in Brazilian schools

What should we call mental ill health? Historical shifts in the popularity of generic terms

An embedded diachronic sense change model with a case study from ancient Greek

Conceptual Analysis through Corpus Text Patterns: Researching W.E.B. Du Bois’ Concept of Democratic Despotism via Regular Expressions

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Text Corpus Research Articles

Related Topics

Articles published on Text Corpus

Research on the Training and Application Methods of a Lightweight Agricultural Domain-Specific Large Language Model Supporting Mandarin Chinese and Uyghur

Multi-word expressions for Russian L2 learners: corpora-based selection with expert verification

“... but be sure you let it settle”: Late Modern Authors’ Presence in English Scientific Texts

Where and how machine learning plays a role in climate finance research

The Words for “Blue” in Old Frisian

What do Contemporary Publications Report about the Generation of Urban Solid Waste (MSW) and/or Consumption from the Perspective of Chemistry Teaching?

Fraseología dialectal en la obra literaria de B. Pérez Galdós

Фонетические процессы в сочетаниях согласных (на материале осетинского языка)

PERCEPTION OF SALVADOR DALí’S CREATION AND ITS INTERPRETATION IN THE TEXTS OF THE 20TH-21ST CENTURIES

Navigating the Evolution of Game-Based Educational Approaches in Secondary STEM Education: A Decade of Innovations and Challenges

Empirical application of sentiment analysis and emotions in Spanish: A post-cognitivist approach

Defending the “Backward Civilization”: The Resurrection of a Forgotten 17th Century Text in 20th Century Intellectual Discourse on Islam

Social representations of health-disease-care, security, and food sovereignty of indigenous students from Insikiran

Wybory translatorskie XIX-wiecznego słownikarza rosyjskiego jako impuls do badań chronologizacyjnych polskiej leksyki zapożyczonej

DAVOMLI HOZIRGI ZAMON GRAMMATIK SHAKLLARI ANNOTATSIYASI

A materials terminology knowledge graph automatically constructed from text corpus

Challenges and prospects for teaching ocean literacy in Brazilian schools

What should we call mental ill health? Historical shifts in the popularity of generic terms

An embedded diachronic sense change model with a case study from ancient Greek

Conceptual Analysis through Corpus Text Patterns: Researching W.E.B. Du Bois’ Concept of Democratic Despotism via Regular Expressions