Abstract

The paper deals with the first digital corpus of texts in the Koraput Munda languages (Sora, Gutob, Bonda), which became available online in Spring 2020. Koraput Munda are spoken in India on the border between states of Odisha and Andhra Pradesh and they all are more or less endangered. Texts in these languages were collected during four expeditions to the state of Odisha in 2016–2018. Koraput Munda speakers live in communities, which differ in religions, traditional occupations, dialects and are influenced by various official languages depending on the state. For example, Sora speakers belong to more than six religious communities and use four types of writing. Therefore, one of the main tasks of the corpus is to present texts of various genres and different social conditions of language usage. At the moment, the corpus includes oral and written texts, poetry and prose, religious, folklore and traditional everyday content. Oral texts are presented both in phonological transcription and in audio and video recordings. The sub-corpus of written texts presented in various scripts contains both texts related to a particular handwritten genre, as well as samples of printed materials. The texts are provided with morphological markup and translation into Russian and English. Each text is accompanied by detailed sociolinguistic and genre-specific information. One of the most special features of the corpus is the system of tags including text format, speaker’s gender, script, genre, topic, religion etc. This project is intended not only to make linguistic materials of the Koraput Munda languages accessible for the global linguistic and anthropological studies, but also to be useful for teaching and preserving cultural heritage, in particular within the framework of the Multi-Language Education government program.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.