Commodity recommendation contributes an important part of individuals’ daily life. In this context, deep reinforcement learning methods have demonstrated substantial efficacy in enhancing recommender systems’ performance. Nevertheless, several recommender systems directly utilize original feature information as a foundational element for decision-making, which seems simplistic and low efficient. Furthermore, the incorporation of sequential decision-making adds complexity to the task of recommendation. In pursuit of maximizing the long-term sequential returns of recommender systems, our study introduces a novel architecture, named User Information Separating Architecture (UISA). This framework is tailored to align with classic reinforcement learning algorithms and aims to extract the user’s interest value through the discrete processing of both static and dynamic user information. Through integration with deep reinforcement learning, the architecture is oriented towards the maximization of long-term profit and is applicable in sequential recommendation scenarios. We conduct experimental assessments by combining the proposed architecture with proximal policy optimization (PPO) and deep deterministic policy gradient (DDPG) algorithms. The outcomes illustrate marked improvements in commodity recommendation, showcasing enhancements ranging from approximately 5% to 40% in both reward and click-through rate metrics across a self-constructed JDEnv environment and the Virtual Taobao environment. Through comparison experiments, the UISA models demonstrate comparable performance.
Read full abstract