Reinforcement Learning to Rank

Maarten De Rijke

doi:10.1145/3289600.3291605

Abstract

Interactive systems such as search engines or recommender systems are increasingly moving away from single-turn exchanges with users. Instead, series of exchanges between the user and the system are becoming mainstream, especially when users have complex needs or when the system struggles to understand the user's intent. Standard machine learning has helped us a lot in the single-turn paradigm, where we use it to predict: intent, relevance, user satisfaction, etc. When we think of search or recommendation as a series of exchanges, we need to turn to bandit algorithms to determine which action the system should take next, or to reinforcement learning to determine not just the next action but also to plan future actions and estimate their potential pay-off. The use of reinforcement learning for search and recommendations comes with a number of challenges, because of the very large action spaces, the large number of potential contexts, and noisy feedback signals characteristic for this domain. This presentation will survey some recent success stories of reinforcement learning for search, recommendation, and conversations; and will identify promising future research directions for reinforcement learning for search and recommendation.

Full Text