Generalized Contextual Bandits With Latent Features: Algorithms and Applications.

Xiongxiao Xu,John C. S. Lui,Hong Xie

doi:10.1109/tnnls.2021.3124603

Xiongxiao Xu, John C. S. Lui + Show 1 more

Open Access

https://doi.org/10.1109/tnnls.2021.3124603

Copy DOI

Abstract

Contextual bandit is a popular sequential decision-making framework to balance the exploration and exploitation tradeoff in many applications such as recommender systems, search engines, etc. Motivated by two important factors in real-world applications: 1) latent contexts (or features) often exist and 2) feedbacks often have humans in the loop leading to human biases, we formulate a generalized contextual bandit framework with latent contexts. Our proposed framework includes a two-layer probabilistic interpretable model for the feedbacks from human with latent features. We design a GCL-PS algorithm for the proposed framework, which utilizes posterior sampling to balance the exploration and exploitation tradeoff. We prove a sublinear regret upper bound for GCL-PS, and prove a lower bound for the proposed bandit framework revealing insights on the optimality of GCL-PS. To further improve the computational efficiency of GCL-PS, we propose a Markov Chain Monte Carlo (MCMC) algorithm to generate approximate samples, resulting in our GCL-PSMC algorithm. We not only prove a sublinear Bayesian regret upper bound for our GCL-PSMC algorithm, but also reveal insights into the tradeoff between computational efficiency and sequential decision accuracy. Finally, we apply the proposed framework to hotel recommendations and news article recommendations, and show its superior performance over a variety of baselines via experiments on two public datasets.

Full Text