Abstract
In this article, the authors propose an approach for abstracting text resources and documents in the Kazakh language. Using software solutions to normalize texts in the Kazakh language, the text data developed by the scientific team of the authors of this work was prepared for further processing. Reviewing is based on keywords and phrases. To extract keywords and phrases, an algorithm is used TF-IDF algorithm to extract keywords and phrases from texts in the Kazakh language. To solve the problem, an approach based on machine learning was applied. To determine the similarity of the sentence, the cosine similarities of the data of the sentence are calculated, and thus the semantic content of the text is determined. When outputting text annotations, the volume of text is taken into account, that is, the amount of annotation depends on the volume of the document. Abstracting of texts in the Kazakh language is an urgent task of classification, clustering of text and information retrieval. The paper presents the results of experimental calculations for various approaches. The results of the study show that the presented approach is the best solution for extracting annotations from texts in the Kazakh language.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.