Interconnected Neural Linear Contextual Bandits with UCB Exploration

Yang Chen,Jiamou Liu,Kaiqi Zhao,Miao Xie

doi:10.1007/978-3-031-05933-9_14

Abstract

AbstractContextual multi-armed bandit algorithms are widely used to solve online decision-making problems. However, traditional methods assume linear rewards and low dimensional contextual information, leading to high regrets and low online efficiency in real-world applications. In this paper, we propose a novel framework called interconnected neural-linear UCB (InlUCB) that interleaves two learning processes: an offline representation learning part, to convert the original contextual information to low-dimensional latent features via non-linear transformation, and an online exploration part, to update a linear layer using upper confidence bound (UCB). These two processes produce an effective and efficient strategy for online decision-making problems with non-linear rewards and high dimensional contexts. We derive a general expression of the finite-time cumulative regret bound of InlUCB. We also give a tighter regret bound under certain assumptions on neural networks. We test InlUCB against state-of-the-art bandit methods on synthetic and real-world datasets with non-linear rewards and high dimensional contexts. Results demonstrate that InlUCB significantly improves the performance on cumulative regrets and online efficiency.KeywordsContextual banditsUpper confidence boundNeural networksRegret bound

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Interconnected Neural Linear Contextual Bandits with UCB Exploration

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Bandit Methods and Selective Prediction in Deep Learning

-

29 Jun 2020
29 Jun 2020

A Multiplier Bootstrap Approach to Designing Robust Algorithms for Contextual Bandits.
Hong Xie ... Qingsheng Zhu
IEEE transactions on neural networks and learning systems | VOL. 34
Hong Xie, et. al.Hong Xie ... Qingsheng Zhu
01 Dec 2023
IEEE transactions on neural networks and learning systems | VOL. 34

Efficient Learning-based Scheduling for Information Freshness in Wireless Networks
Bin Li
-
Bin LiBin Li
10 May 2021
10 May 2021

Bayesian Optimization using Pseudo-Points
Chao Qian ... Ke Xue
-
Chao Qian, et. al.Chao Qian ... Ke Xue
01 Jul 2020
01 Jul 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Interconnected Neural Linear Contextual Bandits with UCB Exploration

Abstract

Talk to us

Similar Papers