The article presents a new version of the electronic corpus of the Tatar language, updated based on a linguistic knowledge graph model for Turkic languages. This new version of the corpus allows for information description across multiple linguistic levels: morphonological, syntactic, and semantic, through the use of knowledge graphs to represent linguistic data. This approach enhances corpus functionality, enabling searches that incorporate syntactic and semantic information. A distinctive feature of the electronic corpus implementation is that the model employed aligns closely with the structural and functional characteristics of Turkic languages and serves as a foundation for developing various software products for semantic text processing in Turkic languages. In particular, these products include the linguistic portal "Turkic Morphme" and the new version of the Tatar language electronic corpus, "Tugan Tel.".
Read full abstract