Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

Tianyuan Jin,Hao-Lun Hsu,William Chang,Pan Xu

doi:10.1609/aaai.v38i11.29193

Abstract

We study the multi-agent multi-armed bandit (MAMAB) problem, where agents are factored into overlapping groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed of individual arms for each agent) and receives a reward according to the hypergraph structure. Specifically, we assume there is a local reward for each hyperedge, and the reward of the joint arm is the sum of these local rewards. Previous work introduced the multi-agent Thompson sampling (MATS) algorithm and derived a Bayesian regret bound. However, it remains an open problem how to derive a frequentist regret bound for Thompson sampling in this multi-agent setting. To address these issues, we propose an efficient variant of MATS, the epsilon-exploring Multi-Agent Thompson Sampling (eps-MATS) algorithm, which performs MATS exploration with probability epsilon while adopts a greedy policy otherwise. We prove that eps-MATS achieves a worst-case frequentist regret bound that is sublinear in both the time horizon and the local arm size. We also derive a lower bound for this setting, which implies our frequentist regret upper bound is optimal up to constant and logarithm terms, when the hypergraph is sufficiently sparse. Thorough experiments on standard MAMAB problems demonstrate the superior performance and the improved computational efficiency of eps-MATS compared with existing algorithms in the same setting.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 1

Similar Papers

Performance Comparison of UCB, TS, and -Greedy TS Algorithms through Simulation of Multi-Armed Bandit Machine
Zhuoran Liu
Applied and Computational Engineering | VOL. 83
Zhuoran LiuZhuoran Liu
31 Oct 2024
Applied and Computational Engineering | VOL. 83

Bayesian Algorithms for Decentralized Stochastic Bandits
Anusha Lalitha ... Andrea Goldsmith
IEEE Journal on Selected Areas in Information Theory | VOL. 2
Anusha Lalitha, et. al.Anusha Lalitha ... Andrea Goldsmith
01 Jun 2021
IEEE Journal on Selected Areas in Information Theory | VOL. 2

Adaptive OFDM Based on Thompson Sampling Algorithm without Channel Knowledge
Haipeng Luo ... Ruixuan He
-
Haipeng Luo, et. al.Haipeng Luo ... Ruixuan He
23 Sep 2022
23 Sep 2022

Program Placement Optimization for Storage-constrained Mobile Edge Computing Systems: A Multi-armed Bandit Approach
Mingjie Feng ... Marwan Krunz
-
Mingjie Feng, et. al.Mingjie Feng ... Marwan Krunz
01 Jun 2021
01 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence