Abstract

Deep reinforcement learning (RL) has been widely applied to personalized recommender systems (PRSs) as they can capture user preferences progressively. Among RL-based techniques, deep Q-network (DQN) stands out as the most popular choice due to its simple update strategy and superior performance. Typically, many recommendation scenarios are accompanied by the diminishing action space setting, where the available action space will gradually decrease to avoid recommending duplicate items. However, existing DQN-based recommender systems inherently grapple with a discrepancy between the fixed full action space inherent in the Q-network and the diminishing available action space during recommendation. This article elucidates how this discrepancy induces an issue termed action diminishing error in the vanilla temporal difference (TD) operator. Due to this discrepancy, standard DQN methods prove impractical for learning accurate value estimates, rendering them ineffective in the context of diminishing action space. To mitigate this issue, we propose the Q-learning-based action diminishing error reduction (Q-ADER) algorithm to modify the value estimate error at each step. In practice, Q-ADER augments the standard TD learning with an error reduction term which is straightforward to implement on top of the existing DQN algorithms. Experiments are conducted on four real-world datasets to verify the effectiveness of our proposed algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.