Investigation of frontier Multi-Armed Bandit algorithms and applications

Liangxu Wang

doi:10.54254/2755-2721/34/20230322

Abstract

Since the amount of content online is growing exponentially and people's time is limited, there is an urgent need for a high-performance algorithm that can make effective recommendations. This paper will introduce a recommendation system model, a sequential decision model, which is called Multi-Armed Bandit. The main idea of the Multi-Armed Bandit model is that at the beginning of the algorithm, all the recommended items are set to the same weight. In the subsequent recommendation process, the model explores the distribution of each item while changing the weight of each item according to the average revenue of each item, and selects more items with larger weight. This paper will introduce three cutting-edge Multi-Armed Bandit algorithms, their algorithmic ideas and their respective characteristics. The idea of Explore-Then-Commit (ETC) algorithm is to explore each item a certain number of times, and then select the best item for subsequent recommendation. The idea of the Upper Confidence Bound (UCB) algorithm is to represent the "exploration" and "exploitation" of each item by numerical values and add them to the UCB value, and select the item with the largest UCB value each time. The idea of TS is to first assume the distribution of each item, and then change the parameters of the distribution of each item based on the reward. At the end, this paper will introduce several scenarios where Multi-Armed Bandit algorithms can be used to give the reader an idea of how to use Multi-Armed Bandit algorithms.

Full Text