Abstract
Corpus linguistics is currently one of the most popular sections of linguistics. Most of the major languages of the world today already have their own digital corpora of tens and hundreds of millions of word usage. Recently, special attention has also been paid to the creation of text corpus in the languages of the peoples of Russia, since, on the one hand, corpus research allows you to look at the structure of the language from a completely different perspective, on the other hand, the corpus is a kind of form of storing language data. The article describes the Udmurt National Corpus, which has been developed since the end of 2019 by the staff of the philological research department of the Udmurt Institute of History, Language and Literature of the Udmurt Federal Research Center of the Ural Branch of the Russian Academy of Sciences. It speaks in detail about the capabilities of the information and reference system being created at the moment, as well as about the prospects for using the corpus of texts when conducting research, preparing dictionaries, and creating various programs in the Udmurt language. The article also deals with the Hunspell-based Udmurt spell checker developed by Grigory Grigoriev, which plays an important role in replenishing the Udmurt National Corps. Before uploading new texts to the site, all of them are subjected to a mandatory check for spelling errors that could remain during their proofreading. This extension for text editors, thanks to the vocabulary database associated with the affix file, which contains all possible morphological variants of the lexemes of the main dictionary, identifies spelling errors in the text, allowing you to upload the most verified texts to the website of the Udmurt National Corpus.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have