Comparative analysis and applications of classic multi-armed bandit algorithms and their variants

Bo Fei

doi:10.54254/2755-2721/68/20241389

Abstract

The multi-armed bandit problem, a pivotal aspect of Reinforcement Learning (RL), presents a classic dilemma in sequential decision-making, balancing exploration with exploitation. Renowned bandit algorithms like Explore-Then-Commit, Epsilon-Greedy, SoftMax, Upper Confidence Bound (UCB), and Thompson Sampling have demonstrated efficacy in addressing this issue. Nevertheless, each algorithm exhibits unique strengths and weaknesses, necessitating a detailed comparative evaluation. This paper executes a series of implementations of various established bandit algorithms and their derivatives, aiming to assess their stability and efficacy. The study engages in empirical analysis utilizing a real dataset, generating charts and data for a thorough examination of the pros and cons associated with each algorithm. A significant aspect of the research focuses on the parameter sensitivity of these algorithms and the impact of parameter tuning on their performance. Findings reveal that the SoftMax algorithm's effectiveness is markedly influenced by the initial estimated mean reward value for each arm. Conversely, algorithms like Epsilon-Greedy and UCB exhibit enhanced performance with optimal parameter settings. Furthermore, the study explores the limitations inherent in classic bandit algorithms and introduces some innovative models and methodologies pertinent to the multi-armed bandit problem, along with their applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comparative analysis and applications of classic multi-armed bandit algorithms and their variants

Abstract

Talk to us

Similar Papers

More From: Applied and Computational Engineering

Lead the way for us

Journal: Applied and Computational Engineering	Publication Date: Jun 6, 2024
License type: cc-by

Similar Papers

Performance Comparison of UCB, TS, and -Greedy TS Algorithms through Simulation of Multi-Armed Bandit Machine
Zhuoran Liu
Applied and Computational Engineering | VOL. 83
Zhuoran LiuZhuoran Liu
31 Oct 2024
Applied and Computational Engineering | VOL. 83

Multiarmed Bandit Algorithms on Zynq System-on-Chip: Go Frequentist or Bayesian?
S V Sai Santosh ... Sumit J Darak
IEEE transactions on neural networks and learning systems | VOL. 35
S V Sai Santosh, et. al.S V Sai Santosh ... Sumit J Darak
01 Feb 2024
IEEE transactions on neural networks and learning systems | VOL. 35

Performance variance in Multi-Armed Bandits: In-depth analysis of three core algorithms
Bowen Chen
Applied and Computational Engineering | VOL. 68
Bowen ChenBowen Chen
06 Jun 2024
Applied and Computational Engineering | VOL. 68

Multi-Armed Bandit Algorithms: Analysis and Applications Across Domains
Qinchuan Zhang
Highlights in Science, Engineering and Technology | VOL. 94
Qinchuan ZhangQinchuan Zhang
26 Apr 2024
Highlights in Science, Engineering and Technology | VOL. 94

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparative analysis and applications of classic multi-armed bandit algorithms and their variants

Abstract

Talk to us

Similar Papers

More From: Applied and Computational Engineering