Deep Reinforcement Factorization Machines: A Deep Reinforcement Learning Model with Random Exploration Strategy and High Deployment Efficiency

Huaidong Yu,Jian Yin

doi:10.3390/app12115314

Abstract

In recent years, the recommendation system and robot learning are undoubtedly the two most popular application fields, and the core algorithms supporting these two fields are deep learning based on perception and reinforcement learning based on exploration learning, respectively. How to combine these two fields to better improve the development of the whole machine learning field is the dream of numerous researchers. The Deep Reinforcement Network (DRN) model successfully embedded reinforcement learning into the recommendation system, which provided a good idea for subsequent researchers. However, the disadvantage is also obvious, that is, the DRN model is built for news recommendations, meaning that the DRN model is not transferable, which is also the defect of many current recommendation system models. Meanwhile, the agent learning method adopted by the DRN model is primitive and inefficient. Among many models and algorithms that have emerged in recent years, we use the newly proposed deployment efficiency to measure their comprehensive quality and found that few models focus on both efficiency and performance improvement. To fill the gap of model deployment efficiency neglected by many researchers and to create a model of reinforcement learning agents with stronger performance, we have been exploring and trying to complete research on the Gate Attentional Factorization Machines (GAFM) model. Finally, we successfully integrated the GAFM model and reinforcement learning. The Deep Reinforcement Factorization Machines (DRFM) model proposed in this paper is based on the combination of deep learning with high perception ability and reinforcement learning with high exploration ability, centered on improving the deployment efficiency and learning performance of the model. The GAFM model is modified and upgraded using multidisciplinary techniques, and a new model-based random exploration strategy is proposed to update and optimize the recommendation list efficiently. Through parallel contrast experiments on various datasets, it is proved that the DRFM model surpasses the traditional recommendation system model in all aspects. The DRFM model is far superior to other models in terms of performance and robustness, and also significantly improved in terms of deployment efficiency. At the same time, we conduct a comparative analysis with the latest deep reinforcement learning algorithm and prove the unique advantages of the DRFM model.

Full Text