Abstract

The issue of semantic text analysis occupies a special place in computational linguistics. Researchers in this field have an increased interest in developing an algorithm that will improve the quality of text corpus processing and probabilistic determination of text content. The results of the study on the application of methods, approaches, algorithms for semantic text analysis in computational linguistics in international and Kazakhstan science led to the development of an algorithm of keyword search in a Kazakh text. The first step of the algorithm was to compile a reference dictionary of keywords for the Kazakh language text corpus. The solution to this problem was to apply the Porter (stemmer) algorithm for the Kazakh language text corpus. The implementation of the stemmer allowed highlighting unique word stems and getting a reference dictionary, which was subsequently indexed. The next step is to collect learning data from the text corpus. To calculate the degree of semantic proximity between words, each word is assigned a vector of the corresponding word forms of the reference dictionary, which results in a pair of a keyword and a vector. And the last step of the algorithm is neural network learning. During learning, the error backpropagation method is used, which allows a semantic analysis of the text corpus and obtaining a probabilistic number of words close to the expected number of keywords. This process automates the processing of text material by creating digital learning models of keywords. The algorithm is used to develop a neurocomputer system that will automatically check the text works of online learners. The uniqueness of the keyword search algorithm is the use of neural network learning for texts in the Kazakh language. In Kazakhstan, scientists in the field of computational linguistics conducted a number of studies based on morphological analysis, lemmatization and other approaches and implemented linguistic tools (mainly translation dictionaries). The scope of neural network learning for parsing of the Kazakh language remains an open issue in the Kazakhstan science. The developed algorithm involves solving one of the problems of effective semantic analysis of the text in the Kazakh language

Highlights

  • In modern research [1,2,3] in the field of computational linguistics using artificial intelligence, a special place is occupied by the development of methods and tools for automated text processing

  • A learning set of text corpora with pre-known keywords is used, as a result of errors minimization, we reveal the difference between the output values of the neural network and the input ones of keywords

  • By means of the Porter stemmer, the dictionary to search for keywords in the Kazakh language, which includes the base of Kazakh word stems and the terminological dictionary for neural network learning was created

Read more

Summary

Introduction

In modern research [1,2,3] in the field of computational linguistics using artificial intelligence, a special place is occupied by the development of methods and tools for automated text processing. Computational linguistics studies the creation and use of electronic text corpora, creation of Information technology electronic dictionaries, thesauruses, ontologies, machine translators, information extraction from texts, automatic abstracting and building knowledge management systems. Lack of knowledge of computational linguistics problems of Kazakh language text processing is one of the reasons for beginning research on the development of a keyword search algorithm. The use of deep neural network learning for keyword search, full-text search for Kazakh language text corpora emphasizes the relevance of the topic in the field of computational linguistics

Literature review and keyword search problem statement
The aim and objectives of the study
Bringing text corpora into machine-readable form
Neural network learning
Result of the algorithm for keyword search in the Kazakh language text corpus
Нейрожелi 8 сандық шама 2 тәжiрибе 9 Жақын 2 заңдылық бiрiгу күштерi
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.