A NEW COMPUTATIONAL MODEL FOR TURKIC LANGUAGES MORPHOLOGY AND PROCESSING

Ualsher Tukeyev

doi:10.26577/jpcsit.2023.v1.i1.07

Abstract

Effective communication between representatives of different nations in the modern global world has become a very relevant problem. Towards its solution, considerable support can come from artificial intelligence tools and, in particular, from natural language processing components. Along this direction, this article proposes the development and the exploitation of new computational morphology model for Turkic languages, based on a complete set of endings (CSE - model). Based on the CSE-model of morphology, a methodology has been developed for the creation and use of universal programs (data-driven) for processing natural languages. These include word stemming, text segmentation and morphological analysis. One advantage of the proposed methodology is that it is oriented towards linguists that only have to prepare i) a list of complete sets of endings for new languages according to the described method, and ii) a list of stop words that do not have endings. Then, based on the prepared lists, the developed universal programs for stemming, segmentation, morphological analysis are used. Experiments carried out for the Kazakh, Kyrgyz and Uzbek languages show a high efficiency of the proposed morphology model, algorithms and tools.

Full Text