Efficient-UCBV: An Almost Optimal Algorithm Using Variance Estimates

Subhojyoti Mukherjee,K P Naveen,Nandan Sudarsanam,Balaraman Ravindran

doi:10.1609/aaai.v32i1.12110

Abstract

We propose a novel variant of the UCB algorithm (referred to as Efficient-UCB-Variance (EUCBV)) for minimizing cumulative regret in the stochastic multi-armed bandit (MAB) setting. EUCBV incorporates the arm elimination strategy proposed in UCB-Improved, while taking into account the variance estimates to compute the arms' confidence bounds, similar to UCBV. Through a theoretical analysis we establish that EUCBV incurs a gap-dependent regret bound which is an improvement over that of existing state-of-the-art UCB algorithms (such as UCB1, UCB-Improved, UCBV, MOSS). Further, EUCBV incurs a gap-independent regret bound which is an improvement over that of UCB1, UCBV and UCB-Improved, while being comparable with that of MOSS and OCUCB. Through an extensive numerical study we show that EUCBV significantly outperforms the popular UCB variants (like MOSS, OCUCB, etc.) as well as Thompson sampling and Bayes-UCB algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient-UCBV: An Almost Optimal Algorithm Using Variance Estimates

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Apr 26, 2018
Citations: 50

Similar Papers

Optimizing Short and Long Term Investment Returns Using Multi-Armed Slot Machine Algorithms
Qiaojia Liu
Applied and Computational Engineering | VOL. 83
Qiaojia LiuQiaojia Liu
31 Oct 2024
Applied and Computational Engineering | VOL. 83

Sequential Experimentation and Learning
Jules Kruijswijk ... Robin Van Emden
-
Jules Kruijswijk, et. al.Jules Kruijswijk ... Robin Van Emden
01 Jan 2023
01 Jan 2023

Approximative Pareto Front Identification
Madalina M Drugan
-
Madalina M DruganMadalina M Drugan
01 Dec 2015
01 Dec 2015

Performance Comparison of UCB, TS, and -Greedy TS Algorithms through Simulation of Multi-Armed Bandit Machine
Zhuoran Liu
Applied and Computational Engineering | VOL. 83
Zhuoran LiuZhuoran Liu
31 Oct 2024
Applied and Computational Engineering | VOL. 83

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient-UCBV: An Almost Optimal Algorithm Using Variance Estimates

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence