The purpose of research. The goal of the research is to develop an intellectual processing algorithm for classifying text information. As the amount of information grows every day, it is necessary to quickly and efficiently separate significant from unimportant content. Therefore, the development of an intellectual processing algorithm for classifying text information is an urgent task.Methods. A method is proposed for classifying text information presented in one or more natural languages. It is based on 5 key stages: entering a task, accumulating a queue of tasks, processing the task, generating the result of processing the task, outputting the result. The input task is presented in the form of an http request, the body of which contains a file object. If the intensity of the input stream is greater than the processing speed, then an accumulation of tasks occurs. After selecting the active task (using the FIFO principle), it is processed. As a result of the transformations, the received data is decoded into a string using UTF-8 encoding. Processing refers to the process of categorization, when a search for patterns occurs in a line. Upon completion of rubrication, the result for the selected task is generated. From the accumulated result, a response to the original http request is formed, the body of which contains a list of found categories.Results. A method and algorithm for processing text data has been developed to determine the topics that are present in the input data set. The algorithm, implemented in software, allows you to work with text data in various languages.Conclusion. The software development of the text data classification algorithm was carried out in the C++ programming language using the Qt libraries version 5.11. This implementation showed a throughput of 1-5 MB per second (on a homogeneous input text data set). The algorithm allows you to correctly process damaged file formats.
Read full abstract