Abstract
The article describes corpus resources for the languages of Russia and their use in linguistic research. The linguistic diversity of the country is quite substantial: currently 155 languages are identified as languages of Russia. Many of them are under threat of extinction, which makes the task of creating corpora particularly relevant as a tool for language preservation. In this study we conducted a survey among the staff of the Institute of Linguistics of the Russian Academy of Sciences and other colleagues, which helped us collect the data about 73 corpus resources representing various languages and dialects of Russia. The sample covers both major languages and languages with relatively few speakers, including unwritten languages. The article examines various parameters by which corpora may differ, and offers examples of research based on materials from the corpora. The final part of the article discusses the organizational aspects of creating and maintaining corpus resources. The results of the study suggest that corpus resources not only play an important role in preserving the linguistic diversity of Russia but also represent a valuable tool for various research tasks, as well as for creating other language resources.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have