Multi-armed bandit problem with online clustering as side information

Andrii Dzhoha,Iryna Rozora

doi:10.1016/j.cam.2023.115132

Abstract

We consider the sequential resource allocation problem under the multi-armed bandit model in the non-stationary stochastic environment. Motivated by many real applications, where information can naturally be grouped, we consider a variation of the contextual multi-armed bandit with online clustering representing side information. We assume a stochastic environment in which the reward of each action, conditioned on a cluster, follows a Bernoulli distribution with unknown parameters. Additionally, we assume that the nature of the problem changes over time and the clusters drift incrementally, making the reward process non-stationary. In this setting, we propose a new algorithm based on a two-stage approach. The first stage is a sequential modification of the traditional k-means clustering algorithm, in which the algorithm deals with the continuous data stream and acts on a subset of data rather than a single batch. In the second stage, we incorporate the current information about clusters into the Thompson Sampling policy with discounting mechanism to track changes in the underlying reward and account for a potential cluster misclassification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Computational and Applied Mathematics	Publication Date: Feb 16, 2023
Citations: 2	License type: cc-by

R Discovery Prime

R Discovery Prime

Multi-armed bandit problem with online clustering as side information

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Applied Mathematics

Lead the way for us

Similar Papers

A survey of the application and technical improvement of the multi-armed bandit
Ruoyi Tong
Applied and Computational Engineering | VOL. 77
Ruoyi TongRuoyi Tong
16 Jul 2024
Applied and Computational Engineering | VOL. 77

A Multi-armed Bandit Algorithm Available in Stationary or Non-stationary Environments Using Self-organizing Maps
Nobuhito Manome ... Shuji Shinohara
-
Nobuhito Manome, et. al.Nobuhito Manome ... Shuji Shinohara
01 Jan 2019
01 Jan 2019

Послiдовний розподiл ресурсiв устохастичному середовищi: загальнийопис, аналiз та чисельнi експерименти
A S Dzhoha
Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics | VOL. -
A S DzhohaA S Dzhoha
01 Jan 2020
Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics | VOL. -

Achieving complete learning in Multi-Armed Bandit problems
Sattar Vakili ... Qing Zhao
-
Sattar Vakili, et. al.Sattar Vakili ... Qing Zhao
01 Nov 2013
01 Nov 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-armed bandit problem with online clustering as side information

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Applied Mathematics