Assessment of the population life quality is an important and relevant sociological task. Machine learning as a classification tool of social network users’ digital traces makes it possible to create a base to calculate subjective life quality index. The article consistently reviews all stages of the machine learning algorithms application to assess the life quality of the population of the regions of the Russian Federation and the issues of improving neural network accuracy. To train the neural network the authors formed a set of marked-up data extracted from regional communities of the social network “VKontakte”. Various approaches to text vectorisation, publicly available neural network models pre-trained on large Russian-language text corpora, as well as metrics for evaluating the algorithms results were analysed. Computational experiments with different algorithms were carried out, according to the results of which the Rubert-tiny algorithm was selected due to its high learning and classification rate. During the model parameters adjustment, the accuracy of f1-macro 0.545 was achieved. Computational experiments were carried out using Python scripts.Typical errors that a neural network makes in the process of automatic content classification were considered. The results of the study can be used to calculate the online activity index in the VKontakte social network of users from various Russian regions, on the basis of which the subjective life quality index will be calculated in the future. Improving the neural network accuracy will make it possible to obtain more reliable data for assessing the life quality in Russian regions based on users’ digital traces.
Read full abstract