A Multi-armed Bandit Algorithm Available in Stationary or Non-stationary Environments Using Self-organizing Maps

Nobuhito Manome,Kosuke Tomonaga,Shuji Shinohara,Shunji Mitsuyoshi,Kouta Suzuki

doi:10.1007/978-3-030-30487-4_41

Nobuhito Manome, Kosuke Tomonaga + Show 3 more

https://doi.org/10.1007/978-3-030-30487-4_41

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Due to the multitude of potential courses of action, communication robots designed to satisfy the users facing them must take appropriate action more rapidly. In practice however, user requests often change while these robots are determining the most appropriate actions for these users. Therefore, it is difficult for robots to derive an appropriate course of action. This issue has been formalized as the “multi-armed bandit (MAB) problem.” The MAB problem points to an environment featuring multiple levers (arms) where pulling an arm has a certain probability of yielding a reward; the issue is to determine how to select the levers to pull to maximize the rewards gained. To solve this problem, we considered a new MAB problem algorithm using self-organizing maps that is adaptable to stationary and non-stationary environments. For this paper, numerous experiments were conducted considering a stochastic MAB problem in both stationary and non-stationary environments. As a result, we determined that the proposed algorithm demonstrated equivalent or improved capability in stationary environments with numerous arms and consistently strong effectiveness in a non-stationary environment compared to the existing UCB1, UCB1-Tuned, and Thompson Sampling algorithms.

Full Text