Abstract
The purpose of research. The goal of the research is to develop an intellectual processing algorithm for classifying text information. As the amount of information grows every day, it is necessary to quickly and efficiently separate significant from unimportant content. Therefore, the development of an intellectual processing algorithm for classifying text information is an urgent task.Methods. A method is proposed for classifying text information presented in one or more natural languages. It is based on 5 key stages: entering a task, accumulating a queue of tasks, processing the task, generating the result of processing the task, outputting the result. The input task is presented in the form of an http request, the body of which contains a file object. If the intensity of the input stream is greater than the processing speed, then an accumulation of tasks occurs. After selecting the active task (using the FIFO principle), it is processed. As a result of the transformations, the received data is decoded into a string using UTF-8 encoding. Processing refers to the process of categorization, when a search for patterns occurs in a line. Upon completion of rubrication, the result for the selected task is generated. From the accumulated result, a response to the original http request is formed, the body of which contains a list of found categories.Results. A method and algorithm for processing text data has been developed to determine the topics that are present in the input data set. The algorithm, implemented in software, allows you to work with text data in various languages.Conclusion. The software development of the text data classification algorithm was carried out in the C++ programming language using the Qt libraries version 5.11. This implementation showed a throughput of 1-5 MB per second (on a homogeneous input text data set). The algorithm allows you to correctly process damaged file formats.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Proceedings of the Southwest State University. Series: IT Management, Computer Science, Computer Engineering. Medical Equipment Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.