QWI

Francisco Robledo,Konstantin Avrachenkov,Urtzi Ayesta,Vivek Borkar

doi:10.1145/3512798.3512816

QWI

Francisco Robledo, Konstantin Avrachenkov + Show 2 more

Open Access

https://doi.org/10.1145/3512798.3512816

Copy DOI

Journal: ACM SIGMETRICS Performance Evaluation Review	Publication Date: Jan 17, 2022
Citations: 2	License type: other-oa

#Whittle Indices #Whittle Index + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The Whittle index policy is a heuristic that has shown remarkable good performance (with guaranted asymptotic optimality) when applied to the class of problems known as multi-armed restless bandits. In this paper we develop QWI, an algorithm based on Q-learning in order to learn theWhittle indices. The key feature is the deployment of two timescales, a relatively faster one to update the state-action Qfunctions, and a relatively slower one to update the Whittle indices. In our main result, we show that the algorithm converges to the Whittle indices of the problem. Numerical computations show that our algorithm converges much faster than both the standard Q-learning algorithm as well as neural-network based approximate Q-learning.

Full Text