Thanks to the intensive development of social networks, the intensity of exchange of short electronic text messages is constantly increasing, the tone of which can serve as a sensitive indicator of public mood and important social phenomena, interesting for sociologists, politicians, economists, and specialists in other fields. In this regard, the task of automating the processing of such natural language messages is of significant scientific and practical interest. The object of this study is the sentiment of user publications in the Twitter social network. Due to the great popularity of the social network itself and the large number of user messages, which are short in nature, it is possible to conveniently determine the mood of user posts and combine them into clusters according to the given parameters of the intelligent system. The subject of the study is methods and algorithms for analysing the sentiment of large arrays of messages containing the necessary keywords and relating to a certain specific topic, determining the factors and distributions of the sentiment of messages based on the input array of system data, dividing messages into main groups and providing estimates within certain defined limits in to each group, division into clusters according to the obtained search point and display of the obtained results in the desired format. The purpose of the work is to implement an intelligent system of sentiment analysis and clustering of publications based on a recurrent neural network of long short-term memory (LSTM) and the k-means clustering algorithm. The following main tasks are solved in the work: 1. To analyse the most used and newest algorithms, methods, approaches and means of implementing tasks of sentiment analysis and clustering of publications in social networks. 2. To develop a conceptual structure of an intellectual system of sentiment analysis and clustering of publications. 3. To form functional tasks for the key modules of the created intelligent system of sentiment analysis and clustering of publications in the Twitter social network. 4. Implement an intelligent system of sentiment analysis and clustering of publications based on a recurrent neural network and the k-means clustering algorithm and conduct experimental verification. Among the methods used for this purpose are the recurrent neural network of long short-term memory; k-means clustering algorithm. The following results were obtained: the general structure of the intellectual system of sentiment analysis and clustering of publications was analyzed, designed and built. The main task of creating the system, first of all, was to improve the recurrent neural network of long-short-term memory, which, thanks to the improved algorithm, significantly facilitates text processing by natural language processors according to text data of a certain size. Also, a special clustering algorithm, namely k-means, was used in parallel, thanks to which it was possible to change the general approach to clustering and the creation of final clusters, in accordance with the obtained results of the work of the recurrent neural network. Conclusions: As a result of applying a combination of LSTM neural network and k-means clustering algorithm, it was possible to speed up the process of sentiment analysis and clustering of posts in the Twitter social network by 10...15% compared to similar convolutional neural networks and hierarchical clustering.
Read full abstract