A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

José Niño-Mora

doi:10.3390/math8122226

Abstract

The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms.

Highlights

We consider a general two-action (1: engaged/active; 0: rested/passive) semi-Markov decision process (SMDP) restless bandit model (see, for example, ([1], Ch. 11) and [2]) of a dynamic and stochastic project, whose state X (t) moves over continuous time t ∈ [0, ∞) across the state space N, which is assumed finite consisting of n, |N | states
In more complex models in which the Whittle index cannot be evaluated in closed form, the most widespread approach, which has its roots in the calibration method for the Gittins index in [49], is to apply an iterative procedure for approximately computing the index
This paper has presented a new algorithm for computing the Whittle index of a general finite-state semi-Markov restless bandit, based on an efficient implementation of the adaptive-greedy algorithmic scheme introduced in [3,24] for restless bandits, in which it was not specified how to evaluate certain metrics arising in the algorithm description

Summary

Introduction

We consider a general two-action (1: engaged/active; 0: rested/passive) semi-Markov decision process (SMDP) restless bandit model (see, for example, ([1], Ch. 11) and [2]) of a dynamic and stochastic project, whose state X (t) moves over continuous time t ∈ [0, ∞) across the state space N , which is assumed finite consisting of n , |N | states. The author has introduced and developed in [3,24] a methodology to establish indexability and compute the Whittle index for general finite-state restless bandits, extended to the semi-Markov denumerable-state case in [4] and to the continuous-state case in [25] The effectiveness of such an approach, based on verification of so-called PCL-indexability conditions—as they are grounded on satisfaction by project performance metrics of partial conservation laws (PCLs)—has been demonstrated in diverse models.

Review of Related Literature

SMDP Restless Bandits and Their Discrete-Stage Reformulation

Indexability

PCL-Indexability and Adaptive-Greedy Algorithm

Optimality Equations and Parametric LP Formulation

Computing the Initial Tableau

Extension to the Average Criterion

Numerical Experiments

Discussion

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematics	Publication Date: Dec 15, 2020
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Similar Papers

A Verification Theorem for Threshold-Indexability of Real-State Discounted Restless Bandits
José Niño-Mora
Mathematics of Operations Research | VOL. 45
José Niño-MoraJosé Niño-Mora
01 Nov 2019
Mathematics of Operations Research | VOL. 45

Multi-Armed Bandit Allocation Indices
Zhiliang Ying ... John C Gittins
Technometrics | VOL. 33
Zhiliang Ying, et. al.Zhiliang Ying ... John C Gittins
01 Nov 1991
Technometrics | VOL. 33

Approximation algorithms for restless bandit problems
Sudipto Guha ... Kamesh Munagala
Journal of the ACM | VOL. 58
Sudipto Guha, et. al.Sudipto Guha ... Kamesh Munagala
01 Dec 2010
Journal of the ACM | VOL. 58

Stochastic and fluid index policies for resource allocation problems
M Larrãnaga ... I.M Verloop
-
M Larrãnaga, et. al.M Larrãnaga ... I.M Verloop
01 Apr 2015
01 Apr 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Mathematics