Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs.

Amir Alipour-Fanid,Monireh Dabaghchian,Kai Zeng

doi:10.1109/tnnls.2021.3110194

Abstract

We study a family of adversarial (a.k.a. nonstochastic) multi-armed bandit (MAB) problems, wherein not only the player cannot observe the reward on the played arm (self-unaware player) but also it incurs switching costs when shifting to another arm. We study two cases: In Case 1, at each round, the player is able to either play or observe the chosen arm, but not both. In Case 2, the player can choose an arm to play and, at the same round, choose another arm to observe. In both cases, the player incurs a cost for consecutive arm switching due to playing or observing the arms. We propose two novel online learning-based algorithms each addressing one of the aforementioned MAB problems. We theoretically prove that the proposed algorithms for Case 1 and Case 2 achieve sublinear regret of O(√[4]KT3lnK) and O(√[3](K-1)T2lnK) , respectively, where the latter regret bound is order-optimal in time, K is the number of arms, and T is the total number of rounds. In Case 2, we extend the player's capability to multiple observations and show that more observations do not necessarily improve the regret bound due to incurring switching costs. However, we derive an upper bound for switching cost as c ≤ 1/√[3]m2 for which the regret bound is improved as the number of observations increases. Finally, through this study, we found that a generalized version of our approach gives an interesting sublinear regret upper bound result of [Formula: see text] for any self-unaware bandit player with s number of binary decision dilemma before taking the action. To further validate and complement the theoretical findings, we conduct extensive performance evaluations over synthetic data constructed by nonstochastic MAB environment simulations and wireless spectrum measurement data collected in a real-world experiment.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Neural Networks and Learning Systems	Publication Date: Jun 1, 2023
Citations: 4	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems

Lead the way for us

Similar Papers

CEMAB: A Cross-Entropy-based Method for Large-Scale Multi-Armed Bandits
Erli Wang ... Dirk P Kroese
-
Erli Wang, et. al.Erli Wang ... Dirk P Kroese
27 Dec 2016
27 Dec 2016

Licensed and Unlicensed Spectrum Management for Energy-Efficient Cognitive M2M
Zhenyu Zhou ... Haijun Liao
-
Zhenyu Zhou, et. al.Zhenyu Zhou ... Haijun Liao
06 Nov 2020
06 Nov 2020

Learning-Based Energy-Efficient Channel Selection for Edge Computing-Empowered Cognitive Machine-to-Machine Communications
Haijun Liao ... Bo Ai
-
Haijun Liao, et. al.Haijun Liao ... Bo Ai
01 May 2020
01 May 2020

Adversarial Multi-armed Bandit for mmWave Beam Alignment with One-Bit Feedback
Irched Chafaa ... E Veronica Belmega
-
Irched Chafaa, et. al.Irched Chafaa ... E Veronica Belmega
12 Mar 2019
12 Mar 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems