Abstract

Reinforcement learning (RL) has shown great promise in optimizing long-term user interest in recommender systems. However, existing RL-based recommendation methods need a large number of interactions for each user to learn the recommendation policy. The challenge becomes more critical when recommending to new users who have a limited number of interactions. To that end, in this paper, we address the cold-start challenge in the RL-based recommender systems by proposing a novel context-aware offline meta-level model-based reinforcement learning approach for user adaptation. Our proposed approach learns to infer each user's preference with a user context variable that enables recommendation systems to better adapt to new users with limited contextual information. To improve adaptation efficiency, our approach learns to recover the user choice function and reward from limited contextual information through an inverse reinforcement learning method, which is used to assist the training of a meta-level recommendation agent. To avoid the need for online interaction, the proposed method is trained using historically collected offline data. Moreover, to tackle the challenge of offline policy training, we introduce a mutual information constraint between the user model and recommendation agent. Evaluation results show the superiority of our developed offline policy learning method when adapting to new users with limited contextual information. In addition, we provide a theoretical analysis of the recommendation performance bound.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call