Tabular and Deep Learning for the Whittle Index

Francisco Robledo Relaño,Urtzi Ayesta,Vivek Borkar,Konstantin Avrachenkov

doi:10.1145/3670686

Abstract

The Whittle index policy is a heuristic that has shown remarkably good performance (with guaranteed asymptotic optimality) when applied to the class of problems known as Restless Multi-Armed Bandit Problems (RMABPs). In this article, we present QWI and QWINN, two reinforcement learning algorithms, respectively tabular and deep, to learn the Whittle index for the total discounted criterion. The key feature is the use of two time-scales, a faster one to update the state-action Q -values, and a relatively slower one to update the Whittle indices. In our main theoretical result, we show that QWI, which is a tabular implementation, converges to the real Whittle indices. We then present QWINN, an adaptation of QWI algorithm using neural networks to compute the Q -values on the faster time-scale, which is able to extrapolate information from one state to another and scales naturally to large state-space environments. For QWINN, we show that all local minima of the Bellman error are locally stable equilibria, which is the first result of its kind for DQN-based schemes. Numerical computations show that QWI and QWINN converge faster than the standard Q -learning algorithm, neural-network based approximate Q-learning, and other state-of-the-art algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tabular and Deep Learning for the Whittle Index

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Lead the way for us

Similar Papers

Towards Q-learning the Whittle Index for Restless Bandits
Jing Fu ... Sarat Moka
-
Jing Fu, et. al.Jing Fu ... Sarat Moka
01 Nov 2019
01 Nov 2019

Index-based sampling policies for tracking dynamic networks under sampling constraints
Ting He ... Dakshi Agrawal
-
Ting He, et. al.Ting He ... Dakshi Agrawal
01 Apr 2011
01 Apr 2011

Large scale charging of electric vehicles: A multi-armed bandit approach
Zhe Yu ... Yunjian Xu
-
Zhe Yu, et. al.Zhe Yu ... Yunjian Xu
01 Sep 2015
01 Sep 2015

Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health
Kai Wang ... Shresth Verma
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37
Kai Wang, et. al.Kai Wang ... Shresth Verma
26 Jun 2023
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tabular and Deep Learning for the Whittle Index

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Modeling and Performance Evaluation of Computing Systems