Abstract
We study a family of adversarial (a.k.a. nonstochastic) multi-armed bandit (MAB) problems, wherein not only the player cannot observe the reward on the played arm (self-unaware player) but also it incurs switching costs when shifting to another arm. We study two cases: In Case 1, at each round, the player is able to either play or observe the chosen arm, but not both. In Case 2, the player can choose an arm to play and, at the same round, choose another arm to observe. In both cases, the player incurs a cost for consecutive arm switching due to playing or observing the arms. We propose two novel online learning-based algorithms each addressing one of the aforementioned MAB problems. We theoretically prove that the proposed algorithms for Case 1 and Case 2 achieve sublinear regret of O(√[4]KT3lnK) and O(√[3](K-1)T2lnK) , respectively, where the latter regret bound is order-optimal in time, K is the number of arms, and T is the total number of rounds. In Case 2, we extend the player's capability to multiple observations and show that more observations do not necessarily improve the regret bound due to incurring switching costs. However, we derive an upper bound for switching cost as c ≤ 1/√[3]m2 for which the regret bound is improved as the number of observations increases. Finally, through this study, we found that a generalized version of our approach gives an interesting sublinear regret upper bound result of [Formula: see text] for any self-unaware bandit player with s number of binary decision dilemma before taking the action. To further validate and complement the theoretical findings, we conduct extensive performance evaluations over synthetic data constructed by nonstochastic MAB environment simulations and wireless spectrum measurement data collected in a real-world experiment.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Neural Networks and Learning Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.