Multi-Player Multi-Armed Bandits With Collision-Dependent Reward Distributions

Chengshuai Shi,Cong Shen

doi:10.1109/tsp.2021.3093261

Chengshuai Shi, Cong Shen

Open Access

https://doi.org/10.1109/tsp.2021.3093261

Copy DOI

Journal: IEEE Transactions on Signal Processing	Publication Date: Jan 1, 2021
Citations: 6	License type: publisher-specific-oa

Affiliation: University of Virginia

Abstract

We study a new stochastic multi-player multi-armed bandits (MP-MAB) problem, where the reward distribution changes if a collision occurs on the arm. Existing literature always assumes a zero reward for involved players if collision happens, but for applications such as cognitive radio, the more realistic scenario is that collision reduces the mean reward but not necessarily to zero. We focus on the more practical no-sensing setting where players do not perceive collisions directly, and propose the Error-Correction Collision Communication (EC3) algorithm that models implicit communication as a reliable communication over noisy channel problem, for which random coding error exponent is used to establish the optimal regret that no communication protocol can beat. Finally, optimizing the tradeoff between code length and decoding error rate leads to a regret that approaches the centralized MP-MAB regret, which represents a natural lower bound. Experiments with practical error-correction codes on both synthetic and real-world datasets demonstrate the superiority of EC3. In particular, the results show that the choice of coding schemes has a profound impact on the regret performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-Player Multi-Armed Bandits With Collision-Dependent Reward Distributions

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Signal Processing

Lead the way for us

Similar Papers

Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits
Guojun Xiong ... Jian Li
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37
Guojun Xiong, et. al.Guojun Xiong ... Jian Li
26 Jun 2023
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37

A Practical Multiplayer Multi-armed bandit Algorithm for Smart City Communication System
Shubhjeet Kumar Tiwari ... Sudhanshu Soni
-
Shubhjeet Kumar Tiwari, et. al.Shubhjeet Kumar Tiwari ... Sudhanshu Soni
19 Mar 2021
19 Mar 2021

Linear Multihop Amplify-and-Forward Relay Channels: Error Exponent and Optimal Number of Hops
Hien Quoc Ngo ... Erik G Larsson
IEEE Transactions on Wireless Communications | VOL. 10
Hien Quoc Ngo, et. al.Hien Quoc Ngo ... Erik G Larsson
01 Jan 2010
IEEE Transactions on Wireless Communications | VOL. 10

Non-stationary Stochastic Multi-armed Bandit Problems with External Information on Stationarity
Hiroyuki Namba
Transactions of the Japanese Society for Artificial Intelligence | VOL. 36
Hiroyuki NambaHiroyuki Namba
01 May 2021
Transactions of the Japanese Society for Artificial Intelligence | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-Player Multi-Armed Bandits With Collision-Dependent Reward Distributions

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Signal Processing