CEMAB: A Cross-Entropy-based Method for Large-Scale Multi-Armed Bandits

Erli Wang,Hanna Kurniawati,Dirk P Kroese

doi:10.1007/978-3-319-51691-2_30

Abstract

The multi-armed bandit (MAB) problem is an important model for studying the exploration-exploitation tradeoff in sequential decision making. In this problem, a gambler has to repeatedly choose between a number of slot machine arms to maximize the total payout, where the total number of plays is fixed. Although many methods have been proposed to solve the MAB problem, most have been designed for problems with a small number of arms. To ensure convergence to the optimal arm, many of these methods, including state-of-the-art methods such as UCB [2], require sweeping over the entire set of arms. As a result, such methods perform poorly in problems with a large number of arms. This paper proposes a new method for solving such large-scale MAB problems. The method, called Cross-Entropy-based Multi Armed Bandit (CEMAB), uses the Cross-Entropy method as a noisy optimizer to find the optimal arm with as little cost as possible. Experimental results indicate that CEMAB outperforms state-of-the-art methods for solving MABs with a large number of arms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CEMAB: A Cross-Entropy-based Method for Large-Scale Multi-Armed Bandits

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Arm Space Decomposition as a Strategy for Tackling Large Scale Multi-armed Bandit Problems
Neha Gupta ... Ole-Christoffer Granmo
-
Neha Gupta, et. al.Neha Gupta ... Ole-Christoffer Granmo
01 Dec 2013
01 Dec 2013

Enhancing UCB-tuned and Asymptotically Optimal UCB Algorithms through Weighted Average Techniques in Multi-Armed Bandit Scenarios
Chang Qu
Highlights in Science, Engineering and Technology | VOL. 94
Chang QuChang Qu
26 Apr 2024
Highlights in Science, Engineering and Technology | VOL. 94

Thompson Sampling for Dynamic Multi-armed Bandits
Neha Gupta ... Ole-Christoffer Granmo
-
Neha Gupta, et. al.Neha Gupta ... Ole-Christoffer Granmo
01 Dec 2011
01 Dec 2011

Optimal Exploration–Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards
Omar Besbes ... Yonatan Gur
Stochastic Systems | VOL. 9
Omar Besbes, et. al.Omar Besbes ... Yonatan Gur
01 Dec 2019
Stochastic Systems | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CEMAB: A Cross-Entropy-based Method for Large-Scale Multi-Armed Bandits

Abstract

Talk to us

Similar Papers