Thompson Sampling for Dynamic Multi-armed Bandits

Neha Gupta,Ashok Agrawala,Ole-Christoffer Granmo

doi:10.1109/icmla.2011.144

Abstract

The importance of multi-armed bandit (MAB) problems is on the rise due to their recent application in a large variety of areas such as online advertising, news article selection, wireless networks, and medicinal trials, to name a few. The most common assumption made when solving such MAB problems is that the unknown reward probability θ k of each bandit arm k is fixed. However, this assumption rarely holds in practice simply because real-life problems often involve underlying processes that are dynamically evolving. In this paper, we model problems where reward probabilities θ k are drifting, and introduce a new method called Dynamic Thompson Sampling (DTS) that facilitates Order Statistics based Thompson Sampling for these dynamically evolving MABs. The DTS algorithm adapts its success probability estimates, hat θ k , faster than traditional Thompson Sampling schemes and thus leads to improved performance in terms of lower regret. Extensive experiments demonstrate that DTS outperforms current state-of-the-art approaches, namely pure Thompson Sampling, UCB-Normal and UCB f , for the case of dynamic reward probabilities. Furthermore, this performance advantage increases persistently with the number of bandit arms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Thompson Sampling for Dynamic Multi-armed Bandits

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields
Jiazhen Wu
Highlights in Science, Engineering and Technology | VOL. 94
Jiazhen WuJiazhen Wu
26 Apr 2024
Highlights in Science, Engineering and Technology | VOL. 94

Performance Comparison of UCB, TS, and -Greedy TS Algorithms through Simulation of Multi-Armed Bandit Machine
Zhuoran Liu
Applied and Computational Engineering | VOL. 83
Zhuoran LiuZhuoran Liu
31 Oct 2024
Applied and Computational Engineering | VOL. 83

A note on the advantage of context in Thompson sampling
Michael Byrd ... Ross Darrow
-
Michael Byrd, et. al.Michael Byrd ... Ross Darrow
01 Jan 2023
01 Jan 2023

A note on the advantage of context in Thompson sampling
Michael Byrd ... Ross Darrow
Journal of Revenue and Pricing Management | VOL. 20
Michael Byrd, et. al.Michael Byrd ... Ross Darrow
24 Mar 2021
Journal of Revenue and Pricing Management | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Thompson Sampling for Dynamic Multi-armed Bandits

Abstract

Talk to us

Similar Papers