Abstract

Nowadays, Recommender Systems have played a crucial role in several entertainment scenarios by making personalised recommendations and guiding the entire users’ journey from their first interaction. Recent works have addressed it as a Contextual Bandit by providing a sequential decision model to explore items not tried yet (or not tried enough) or exploit the best options learned so far. However, this work noticed these current algorithms are limited to naive non-personalised approaches in the first interactions of a new user, offering random or most popular items. Through experiments in three domains, we identify a negative impact of these first choices. Our study indicates that the bandit performance is directly related to the choices made in the first trials. Then, we propose a new approach to balance exploration and exploitation in the first interactions and handle these drawbacks. This approach is based on the Active Learning theory to catch more information about the new users and improve their long-term experience. Our idea is to explore the potential information gain of items that can also please the user’s taste. This method is named Warm-Starting Contextual Bandits, and it statistically outperforms 10 benchmarks in the literature in the long run.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call