ОТ АБАРМА ДО ЯЩИЧИШКА: РАЗРАБОТКА ЛЕКСИКОГРАФИЧЕСКОГО КОМПОНЕНТА ТОМСКОГО ДИАЛЕКТНОГО КОРПУСА

Svetlana S Zemicheva

doi:10.17223/22274200/18/5

Abstract

One of the most important trends in modern dialectological science is creating new electronic resources. The article gives an overview of Russian resources of this kind. Among them dialectal corpora hold a special place. The author of the article focuses on the Tomsk Dialect Corpus, which today includes more than 1,700,000 tokens. This resource is unparalleled in Russian scientific practice. It is designed as a universal information retrieval system which includes three modules: 1) textual, 2) grammatical, 3) lexicographic. The aim of the lexicographic component is to provide definitions of dialect lexemes. To do this, it is proposed to use the Dictionary of Russian Old-Timers’ Dialects of the Middle Part of the River Ob Basin (1964–1967) edited by V.V. Palagina and two supplements to it (1975, 1983–1986). The phases of the implementation of the lexicographic module into the Tomsk Dialect Corpus are described. The first phase was the automatic recognition of the above-mentioned paper dictionary. The second stage is editing the dictionary. The principles of editing the source material are determined by the fact that the lexicographic component is considered as part of a universal electronic system. Two basic editing principles are: the possibility to process a word automatically and the autonomous functioning of each dictionary entry. In accordance with them, the vocabulary and the structure of the dictionary entry were formed. At the stage of forming the vocabulary, some dictionary entries (for example, two-word ones) were discarded. The structure of the dictionary entry contains the main areas: headword, definition and contexts. One of the main editing tasks is to combine dictionary entries from different volumes of the dictionary into one. These words are marked either as homonyms, or as the meanings of one word. Examples of dictionary entries before and after editing are presented in the article. By now, about a half of the original vocabulary has been processed (letters from A to M, 12,450 entries). The final version of the electronic dictionary as part of the Tomsk Dialect Corpus is planned to be presented on the website of the Laboratory of General and Siberian Lexicography (http://losl.tsu.ru/) by June 2021. The prospects of the project include, firstly, the expansion of the vocabulary, and secondly, the implementation of search by dictionary labels (diminutives, augmentative, etc.) into the corpus. The presented solutions can be used in the development of other dialect corpora.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ОТ АБАРМА ДО ЯЩИЧИШКА: РАЗРАБОТКА ЛЕКСИКОГРАФИЧЕСКОГО КОМПОНЕНТА ТОМСКОГО ДИАЛЕКТНОГО КОРПУСА

Abstract

Talk to us

Similar Papers

More From: Voprosy leksikografii

Lead the way for us

Journal: Voprosy leksikografii	Publication Date: Jan 1, 2020
Citations: 1

Similar Papers

О ФОРМИРОВАНИИ ВОКАБУЛ И ОРГАНИЗАЦИИ СЛОВАРНОЙ СТАТЬИ В СЛОВАРЕ ЯЗЫКА ОЛОНХО
Robbek Liya
Epic studies | VOL. 1
Robbek LiyaRobbek Liya
29 Mar 2024
Epic studies | VOL. 1

Integracja Elektronicznego słownika języka polskiego XVII i XVIII wieku i Elektronicznego Korpusu Tekstów Polskich z XVII i XVIII Wieku okiem użytkownika i redaktora
Aleksandra Wieczorek
-
Aleksandra WieczorekAleksandra Wieczorek
01 Jan 2020
01 Jan 2020

Дидактичний потенціал електронних освітніх ресурсів у системі неперервної освіти
Balalaieva O
HUMANITARIAN STUDIOS: PEDAGOGICS, PSYCHOLOGY, PHILOSOPHY | VOL. 12
Balalaieva OBalalaieva O
01 Dec 2021
HUMANITARIAN STUDIOS: PEDAGOGICS, PSYCHOLOGY, PHILOSOPHY | VOL. 12

Использование электронных словарей русского языка в научно-исследовательской работе студентов (на материале интернет-конкурса «Знатоки русского языка»)
Olga Vladimirovna Soboleva
Proceedings of Southern Federal University. Philology | VOL. -
Olga Vladimirovna SobolevaOlga Vladimirovna Soboleva
24 Mar 2016
Proceedings of Southern Federal University. Philology | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ОТ АБАРМА ДО ЯЩИЧИШКА: РАЗРАБОТКА ЛЕКСИКОГРАФИЧЕСКОГО КОМПОНЕНТА ТОМСКОГО ДИАЛЕКТНОГО КОРПУСА

Abstract

Talk to us

Similar Papers

More From: Voprosy leksikografii