Efficient Exploration and Exploitation for Sequential Music Recommendation

Bruno Laporais Pereira,Rodrygo L T Santos,Pedro Dalla Vecchia Chaves

doi:10.1145/3625827

Abstract

Music streaming services heavily rely upon recommender systems to acquire, engage, and retain users. One notable component of these services are playlists, which can be dynamically generated in a sequential manner based on the user’s feedback during a listening session. Online learning to rank approaches have recently been shown effective at leveraging such feedback to learn users’ preferences in the space of song features. Nevertheless, these approaches can suffer from slow convergence as a result of their random exploration component and their session-agnostic exploitation component. To overcome these limitations, we propose a novel online learning to rank approach which efficiently explores the space of candidate recommendation models by restricting itself to the orthogonal complement of the subspace of previous underperforming exploration directions. Moreover, we propose a session-aware exploitation component which leverages the momentum of the current best model during updates. Our thorough evaluation using simulated listening sessions from two large Last.fm datasets demonstrates substantial improvements over state-of-the-art approaches in terms of early-stage performance, which results in an improved user experience during online learning. In addition, we demonstrate that long-term convergence can be further enhanced by adaptively relaxing exploration constraints along the way.

Full Text