Abstract

In this article, the authors propose an approach for abstracting text resources and documents in the Kazakh language. Using software solutions to normalize texts in the Kazakh language, the text data developed by the scientific team of the authors of this work was prepared for further processing. Reviewing is based on keywords and phrases. To extract keywords and phrases, an algorithm is used TF-IDF algorithm to extract keywords and phrases from texts in the Kazakh language. To solve the problem, an approach based on machine learning was applied. To determine the similarity of the sentence, the cosine similarities of the data of the sentence are calculated, and thus the semantic content of the text is determined. When outputting text annotations, the volume of text is taken into account, that is, the amount of annotation depends on the volume of the document. Abstracting of texts in the Kazakh language is an urgent task of classification, clustering of text and information retrieval. The paper presents the results of experimental calculations for various approaches. The results of the study show that the presented approach is the best solution for extracting annotations from texts in the Kazakh language.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call