State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Shuang Wu,Jingyu Zhao,Guangjian Tian,Jun Wang

doi:10.24963/ijcai.2021/64

Abstract

The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health
Kai Wang ... Shresth Verma
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37
Kai Wang, et. al.Kai Wang ... Shresth Verma
26 Jun 2023
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37

Large scale charging of electric vehicles: A multi-armed bandit approach
Zhe Yu ... Yunjian Xu
-
Zhe Yu, et. al.Zhe Yu ... Yunjian Xu
01 Sep 2015
01 Sep 2015

Channel probing for opportunistic access with multi-channel sensing
Keqin Liu ... Qing Zhao
-
Keqin Liu, et. al.Keqin Liu ... Qing Zhao
01 Oct 2008
01 Oct 2008

Uncertainty-of-Information Scheduling: A Restless Multiarmed Bandit Framework
Gongpu Chen ... Soung Chang Liew
IEEE Transactions on Information Theory | VOL. 68
Gongpu Chen, et. al.Gongpu Chen ... Soung Chang Liew
01 Sep 2022
IEEE Transactions on Information Theory | VOL. 68

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Abstract

Talk to us

Similar Papers