Abstract

This paper provides an overview of existing modern methods and software approaches for semantic analysis. Based on the research done, it was revealed that, for the semantic analysis of text resources, an approach based on machine learning is most used. This article presents the developed algorithm for the semantic analysis of the text in the Kazakh language. The paper also presents a software solution to this approach implemented in the Python programming language. The vector representation of words was obtained by machine learning based on the corpus, which is 1 million sentences in the Kazakh language. In the software implementation, well-known libraries such as gensim, matplotlib, sklearn, numpy, etc. were used. Based on a set of semantically related pairs of words, an ontology for a specific document is built, which is formed during the operation of a neural network. The paper presents the results of the experiments in the graphical form of a set of words. The novelty of the proposed approach lies in the identification of semantic close words in meaning in texts in the Kazakh language. This work contributes to solving problems in machine translation systems, information retrieval, as well as in analysis and processing systems in the Kazakh language.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call