Long-term Recommender System based on ACP Framework

Tianyi Huang,William Zhu

doi:10.1109/dtpi52967.2021.9540193

Abstract

Recommender systems play an important role in online services to suggest the items that best fit the needs of the users. Traditional recommender systems consider the recommendation procedure as a static process. However, these systems cannot catch the preferences of the users well enough, because the essence of the recommendation procedure is a sequential decision process. Therefore, some researchers use reinforcement learning, a useful tool for optimizing the sequential decision process, to learn the recommender system. But the dependency of the sampled data in the recommender system is often uncertain and the probability distribution of these data is usually variant. So it is hard for reinforcement learning to consider the states and actions of a recommender system in an integrated way. To solve this problem, we combine the recommender system into ACP framework by parallel reinforcement learning, which can effectively explore the uncertain dependency and the probability distribution of the sampled data in a real-world system. The real environment in recommender system is the users for modeling a parallel environment. The reinforcement learning process can be seen as the computational experiments to analyze how to generate recommendations. The learned parallel environment is used to provide the predicted state for learning and generating the recommendations in our trained model. The predicted state in a parallel environment is learned from the probability distribution of the sampled data in a real environment and has less uncertainty than the state in this real environment. Thus, it is more effective to generate the recommendations by the predicted state in the parallel environment. It is promising that the effective algorithm for learning a recommender system can be constructed by combining it into ACP framework.

Full Text