Modified Index Policies for Multi-Armed Bandits with Network-like Markovian Dependencies

Abdalaziz Sawwan,Jie Wu

doi:10.3390/network5010003

Abdalaziz Sawwan, Jie Wu

Open Access

https://doi.org/10.3390/network5010003

Copy DOI

Export

Save

Cite

Journal: Network	Publication Date: Jan 29, 2025
License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

Sequential decision-making in dynamic and interconnected environments is a cornerstone of numerous applications, ranging from communication networks and finance to distributed blockchain systems and IoT frameworks. The multi-armed bandit (MAB) problem is a fundamental model in this domain that traditionally assumes independent and identically distributed (iid) rewards, which limits its effectiveness in capturing the inherent dependencies and state dynamics present in some real-world scenarios. In this paper, we lay a theoretical framework for a modified MAB model in which each arm’s reward is generated by a hidden Markov process. In our model, each arm undergoes Markov state transitions independent of play in a way that results in varying reward distributions and heightened uncertainty in reward observations. The number of states for each arm can be up to three states. A key challenge arises from the fact that the underlying states governing each arm’s rewards remain hidden at the time of selection. To address this, we adapt traditional index-based policies and develop a modified index approach tailored to accommodate Markovian transitions and enhance selection efficiency for our model. Our proposed proposed Markovian Upper Confidence Bound (MC-UCB) policy achieves logarithmic regret. Comparative analysis with the classical UCB algorithm reveals that MC-UCB consistently achieves approximately a 15% reduction in cumulative regret. This work provides significant theoretical insights and lays a robust foundation for future research aimed at optimizing decision-making processes in complex, networked systems with hidden state dependencies.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Modified Index Policies for Multi-Armed Bandits with Network-like Markovian Dependencies

Abstract

Published Version

Talk to us

Similar Papers

More From: Network

Lead the way for us

Similar Papers

An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward.
Shangdong Yang ... Yang Gao
IEEE transactions on neural networks and learning systems | VOL. 32
Shangdong Yang, et. al.Shangdong Yang ... Yang Gao
01 May 2021
IEEE transactions on neural networks and learning systems | VOL. 32

Performance Comparison of UCB, TS, and -Greedy TS Algorithms through Simulation of Multi-Armed Bandit Machine
Zhuoran Liu
Applied and Computational Engineering | VOL. 83
Zhuoran LiuZhuoran Liu
31 Oct 2024
Applied and Computational Engineering | VOL. 83

Pure exploration in finitely-armed and continuous-armed bandits
Sébastien Bubeck ... Gilles Stoltz
Theoretical Computer Science | VOL. 412
Sébastien Bubeck, et. al.Sébastien Bubeck ... Gilles Stoltz
28 Dec 2010
Theoretical Computer Science | VOL. 412

Pure Exploration in Multi-armed Bandits Problems
Sébastien Bubeck ... Rémi Munos
-
Sébastien Bubeck, et. al.Sébastien Bubeck ... Rémi Munos
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Modified Index Policies for Multi-Armed Bandits with Network-like Markovian Dependencies

Abstract

Published Version

Talk to us

Similar Papers

More From: Network