Applications and Advances of UCB Algorithms in Dynamic and Contaminated Environments

Mengxuan Du

doi:10.54254/2755-2721/83/2024glg0077

Mengxuan Du

Open Access

PDF Available

https://doi.org/10.54254/2755-2721/83/2024glg0077

Copy DOI

Export

Save

Cite

Journal: Applied and Computational Engineering	Publication Date: Oct 31, 2024
License type: cc-by

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The Upper Confidence Bound (UCB) algorithm is a widely used approach in the Multi-Armed Bandit (MAB) problem, where the goal is to maximize cumulative rewards over time by selecting the best possible action among several options. The UCB algorithm uses confidence bounds to balance exploration and exploitation, guiding its decision-making. In recent years, researchers have identified significant challenges in applying the UCB algorithm to dynamic and contaminated environments. In such scenarios, the underlying conditions may change over time, making it difficult for standard UCB to adapt, or the data may be polluted by noise and outliers, leading to incorrect estimations of reward distributions. To address these challenges, several variants of the UCB algorithm have been developed. These new approaches are designed to better handle the complexities of changing environments and data contamination, ensuring more robust and reliable performance in these difficult settings. This paper aims to provide a comprehensive review of Robust-UCB (cr-UCB), Sliding Window UCB (SW-UCB) and bandit-over-bandit UCB (BOB-UCB). Focusing on their theoretical foundations, practical applications, and empirical performance. By examining how these algorithms have been adapted to handle the complexities of dynamic and contaminated environments, we found that the adaptability of these algorithms in dynamic environments is significantly improved, and they can effectively reduce decision-making errors caused by data pollution, thus providing a more reliable solution to the multi-armed bandit problem.

Full Text