A survey of the application and technical improvement of the multi-armed bandit

Ruoyi Tong

doi:10.54254/2755-2721/77/20240631

Abstract

In recent years, the multi-armed bandit (MAB) model has been widely used and has shown excellent performance. This article provides an overview of the applications and technical improvements of the multi-armed bandit machine problem. First, an overview of the multi-armed bandit problem is presented, including the explanation of a general modeling approach and several existing common algorithms, such as -greedy, ETC, UCB, and Thompson sampling. Then, the real-life applications of the multi-armed bandit model are explored, covering the fields of recommender systems, healthcare, and finance. Then, some improved algorithms and models are summarized by addressing the problems encountered in different application domains, including the multi-armed bandit considering multiple objectives, the mortal multi-armed bandits, the multi-armed bandit considering contextual side information, combinatorial multi-armed bandits. Finally, the characteristics, trends of changes among different algorithms, and applicable scenarios are summarized and discussed.

Full Text