Constrained contextual bandit algorithm for limited-budget recommendation system

Long Yang,Yafei Zhao

doi:10.1016/j.engappai.2023.107558

Abstract

Recommendation systems have benefited significantly from contextual bandits. Although so many successful applications and recent advances of contextual bandits to online personalized recommendation, they ignore a basic fact that happens in real world recommendation systems: there is a cost for each exploration, and the total cost is limited by a finite budget. In this paper, we propose a linear UCB (with Hybrid estimator) based adaptive linear programming (LinUCB-Hybrid-ALP) algorithm. The proposed LinUCB-Hybrid-ALP algorithm provides adaptive linear programming with LinUCB to approximate the oracle of its corresponding constrained contextual bandit problem. LinUCB-Hybrid-ALP contains the main two parts: firstly, we use LinUCB with a Hybrid model to estimate the expected reward of each arm; then, the algorithm pulls an arm according to a probability distribution determined by an ALP (adaptive linear programming) with a limited budget.Finally, we conduct extensive experiments to demonstrate the effectiveness of LinUCB-Hybrid-ALP on both synthetic data sets and real-world recommendation datasets. Results show that the proposed LinUCB-Hybrid-ALP outperforms the state-of-the-art bandit algorithms significantly.

Full Text