Empirical performances comparison for ETC algorithm

Tianfeng Chen

doi:10.54254/2755-2721/13/20230705

Abstract

Explore-then-commit (ETC) algorithm is a widely used algorithm in bandit problems, which are used to identify the optimal choice among a series of choices that yield random outcomes. The ETC algorithm is adapted from A/B testing, a popular procedure in decision-making process. This paper explores the multi-armed bandit problem and some related algorithms to tackle the multi-armed bandit problem. In particular, this paper focuses on the explore-then-commit (ETC) algorithm, a simple algorithm that has an exploration phase, and then commits the best action. To evaluate the performance of ETC, a variety of settings is made in the experiment, such as the number of arms and input parameter m, i.e., how many times each arm is pulled in the exploration phase. The result shows that the average cumulative regret increases when the number of arms gets larger. With the increase of parameter m, the cumulative regret decreases in the beginning, until reaching the minimum value, and then starts increasing. The purpose of this paper is to empirically evaluate the performance of the ETC algorithm and investigate the relationships between the parameter settings and the overall performance of the algorithm.

Full Text