Abstract
The article describes an approach to formalizing basic processes and building a mathematical model for a system for collecting and analyzing data from electronic media. The authors, as part of a scientific study, are creating a system, including the development of new algorithms, methods and approaches for collecting and analyzing textual information from Internet news sources. The main direction of the study is the application of methods for the mining of text data based on the technology of artificial neural networks, methods of natural language processing, text mining, machine learning and big data processing. Purpose of the study. To develop a formalized description of the model of the system for monitoring and analyzing the text information of electronic news media using the methods of mathematical modeling. Research methods and tools. The use of the toolkit of the methodology of mathematical modeling, with the methods of system analysis is proposed. To study the system, such methods of system analysis as abstraction, formalization, composition and decomposition, structuring and restructuring, modeling, recognition and identification were used. The system is considered as a formalized model of an automatic classifier and clusterizer for a set of text documents in a natural language in the form of an algebraic system. To solve the problems of classification and clustering of texts, it is proposed to apply machine learning methods based on neural network approaches. The structure of the system and its constituent processes, as well as processes interacting with the system from outside, are presented in the form of a formalized mathematical description. Results. The developed formalized mathematical description of the system model clearly shows the interconnection of the system components with each other, as well as internal processes. The applied approach makes it possible to detail the representation of the system based on its decomposition into subsystems and modules. All this makes it possible to streamline the sequence of stages of creating a system and decompose them into separate stages of work. Conclusion. The results obtained in the course of the study allow us to move on to the next stage of the life cycle of the information system being developed - its software development.
Highlights
Введение Ранее, в рамках диссертационного исследования, авторы в статьях [1, 2] исследовали вопросы воздействия современных электронных новостных интернет-источников на общество, в частности на оборонно-промышленные предприятия нашей страны.
Ключевые слова: мониторинг информации СМИ, анализ данных, система мониторинга и анализа данных, анализ текста, математическая модель системы, интеллектуальный анализ данных, нейросетевые методы, системный анализ, классификация текстов, кластеризация текстов.
Формализация базовых процессов и математическая модель системы мониторинга и анализа публикаций электронных СМИ
Summary
Введение Ранее, в рамках диссертационного исследования, авторы в статьях [1, 2] исследовали вопросы воздействия современных электронных новостных интернет-источников на общество, в частности на оборонно-промышленные предприятия нашей страны. Ключевые слова: мониторинг информации СМИ, анализ данных, система мониторинга и анализа данных, анализ текста, математическая модель системы, интеллектуальный анализ данных, нейросетевые методы, системный анализ, классификация текстов, кластеризация текстов. Формализация базовых процессов и математическая модель системы мониторинга и анализа публикаций электронных СМИ В виде алгебраической системы формализованную модель автоматического классификатора текстовых данных, позволяющую применять методы классификации, применяемые в данной работе, можно описать как кортеж [7]: R C, T , F , RCF , f , (1.1)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control & Radioelectronics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.