Subject. The article addresses the selection of determinants that are significant for assessing the level of socio-economic situation and development potential of Russian regions. Objectives. The aim is to study machine learning algorithms for the selection of determinants – predictors of the level of socio-economic situation and development potential of Russian regions, to build models of classification of regions, according to the level of socio-economic situation, using various machine learning algorithms. Methods. To build classification models, I used data from the Federal State Statistics Service, the Institute of Scientific Communications, the RIA Novosti news agency, and the TAdviser Internet portal. Procedures for data classification, model parameter estimation, selection of significant determinants and visualization of results are performed, using the basic functions of the PyCaret library. Cohen's Kappa statistics and Matthews correlation coefficient were employed as priority metrics for evaluating the model productivity. The algorithms for selecting determinants are implemented in the Google Colab analytical environment. Results. I constructed multiclass classification models based on simple and ensemble machine learning algorithms. Simple classification algorithms, including logistic and ridge regression models, naive Bayesian algorithm, decision tree, support vector machine, and k-nearest neighbor methods are characterized by accuracy values at 77%, however, Cohen's Kappa statistics and the Matthews correlation coefficient only show a satisfactory relationship between the actual and predicted value of the region class. Ensemble algorithms, including random forest, gradient boosting and extreme gradient boosting, are characterized by a close relationship between the actual and forecast estimates of the classifier at a level of more than 70%. Conclusions. The random forest algorithm is recognized as the most effective classification model. The gross regional product and investments in fixed assets are informative determinants for measuring the socio-economic status.
Read full abstract