Abstract

Recommendation systems have benefited significantly from contextual bandits. Although so many successful applications and recent advances of contextual bandits to online personalized recommendation, they ignore a basic fact that happens in real world recommendation systems: there is a cost for each exploration, and the total cost is limited by a finite budget. In this paper, we propose a linear UCB (with Hybrid estimator) based adaptive linear programming (LinUCB-Hybrid-ALP) algorithm. The proposed LinUCB-Hybrid-ALP algorithm provides adaptive linear programming with LinUCB to approximate the oracle of its corresponding constrained contextual bandit problem. LinUCB-Hybrid-ALP contains the main two parts: firstly, we use LinUCB with a Hybrid model to estimate the expected reward of each arm; then, the algorithm pulls an arm according to a probability distribution determined by an ALP (adaptive linear programming) with a limited budget.Finally, we conduct extensive experiments to demonstrate the effectiveness of LinUCB-Hybrid-ALP on both synthetic data sets and real-world recommendation datasets. Results show that the proposed LinUCB-Hybrid-ALP outperforms the state-of-the-art bandit algorithms significantly.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call