Abstract

The article presents a solution to one of the problems of special linguistic markup in the RuTuBiC corpus – the Russian Speech Corpus of Russian-Turkic Bilinguals, asso-ciated with error annotation at the lexical level. The corpus includes three subcorpuses representing materials of the Russian speech of Shor-Russian, Tatar-Russian and Khakass-Russian bilinguals. The article presents solutions developed on the basis of all subcorpuses; the illustrative contexts are drawn from the Shor-Russian subcorpus, recordings of interviews with 14 respondents, about 20 hours of sound. The recordings were made during expeditions to Shoria in 2017–2019. Bilingualism of the respondents is defined as early natural bilingualism with the dominance of the second Russian lan-guage, mother tongues are languages of the family heritage. The theoretical basis of the research was works on linguistic contact at the lexical level. Solutions based on the differentiation of lexemes fully mastered by the system of standard Russian and units with the status of borrowings from other subsystems of the national language and other languages are proposed. In the latter case, linguistic and contextual features are distin-guished that oppose lexical borrowing and code-switching. The typical errors singled out at the lexical level are: [LexId] – idiomatic expressions that are not fixed in the standard language (dialectal and vernacular, slang, etc.), they can also be Turkic calques; [LexSem] – general Russian words used in meanings different from those fixed in the normative sources; [LexSemAgr] – violations of the lexical and semantic agreement norms. The units borrowed from the mother tongue of the respondents are located on the scale of transitions from nuclear to borderline. The nuclear units marked with the [Lex] tag are dialectal units, common words, other word usage cases that are outside the standard, as well as borrowings from the Turkic languages that are not included in the dictionaries of standard Russian. On the border “to the left” are borrowings assimilated to different degrees. On the border “to the right” are non-assimilated borrowings and code-switches. The [CodeSw] marks code-switching, insertion of mother tongue elements into Russian speech. The author considers the inclusion of statements as nuclear cases of code-switching, and single lexical inclusions as transitional cases. Code-switching is evidenced by metatext and linguistic proper, primarily phonetic, indicators. There is an insignificant number of both lexical borrowings and cases of code-switching in the speech of the respondents of the RuTuBiC corpus, which depends on the type of bilingualism. The typicality of metatext marking of borrowings and code-switches is determined by the discursive, genre and thematic limitations of the corpus.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call