The article provides information about modern problems of writing the Kazakh language, the importance of its role and development in the context of mass digitization using artificial intelligence technologies and computational linguistics methods. The incorrectness of the current alphabet of the Kazakh language based on the Cyrillic alphabet is proved in connection with the inclusion of Cyrillic letters in it, denoting phonemes that are not included in its sound structure. The necessity of reforming the Kazakh writing by replacing the incorrect alphabet is substantiated. Errors and contradictions are shown in the approved version of the Kazakh alphabet based on the Latin alphabet, as well as the alphabet proposed as a replacement for the approved one, in which some previous errors are repeated. In both cases, no analysis and clarification of the sound system of the Kazakh language, which is the basis of any alphabet, is carried out. In this study, to clarify the sound system of the Kazakh language, experiments were carried out to determine the articulation and acoustic features of Kazakh sounds with the help the computer programs used for many natural languages. In the articulation analysis, special attention was paid to vowels, which give rise to various contradictions in the Kazakh letter. It is proposed to use a new classification of vowels according to four binary features, rather than the traditional classification according to three binary features. Acoustic analysis uses the method of formant analysis, which is aimed at identifying certain formants in the spectrogram. The formant is obtained using a spectrograph. Quantitatively, the formants correspond to the maxima in the speech spectrum and usually appear on spectrograms as horizontal bands. After determining the composition and classification of the sound system of the Kazakh language, two variants of the alphabet based on the Latin alphabet are proposed: the first one is based on the Turkish alphabet using diacritical marks; the second is based on the English alphabet using digraphs. The second option offers ways to solve problems that arise when using digraphs. In conclusion, information is provided on the ongoing and ongoing work in Kazakhstan related to the creation of smart systems in the Kazakh language based on the methods and technologies of artificial intelligence and computational linguistics, the results of which are reflected in the list of sources.
Read full abstract