The purpose of the study is to develop data modeling methods for projecting recommender algorithms using doubly stochastic autoregressive models of random processes and checking their adequacy by applying machine learning algorithms to cluster users in a simulated data set and predict probabilities of interest. Research methods. The article discusses the methods used in the construction of recommender systems. At the same time, the problem of modeling user behavior using a doubly stochastic model is considered. This model is proposed for generating artificial data. The doubly stochastic model allows generating non-stationary processes, thus creating users with different probabilistic properties in different groups of objects of interest. After that, artificially created users (and their activity) are clustered based on a modified K-means algorithm. The main modification is the need for automatic pre-estimation of the number of clusters, and not its choice by a person. Next, the behavior of representatives of each user group for new events is modeled. Based on the generated information and training data, the problem of predictiing and ranking the services offered is solved. At the same time, at the first stage, the use of regression models is sufficient to assign users to a group and form offers for this user. Results of the study. On the training data in 2 clusters, high determination indices were achieved, which indicates approximately 90% of the explained variance when using the proposed doubly stochastic model. Particular attention is paid to the work of modern recommender systems on the example of the Disco system developed by Yandex. In addition, pre-processing and preliminary analysis of data from the real sector was performed, namely, the data of a telecommunications company are being studied. For the purpose of issuing relevant proposals for communication services, a test recommender system has been developed. Conclusion. Thus, the main results of the work include a mathematical model that simulates the reaction of users to various services, as well as a logistic regression model used to predict the probability of a user's interest in a new service. Based on predicted probabilities, it is not difficult to rank new proposals. Approbation on the synthesized data showed the high efficiency of the model.
Read full abstract