Abstract

The Whittle index for restless bandits (two-action semi-Markov decision processes) provides an intuitively appealing optimal policy for controlling a single generic project that can be active (engaged) or passive (rested) at each decision epoch, and which can change state while passive. It further provides a practical heuristic priority-index policy for the computationally intractable multi-armed restless bandit problem, which has been widely applied over the last three decades in multifarious settings, yet mostly restricted to project models with a one-dimensional state. This is due in part to the difficulty of establishing indexability (existence of the index) and of computing the index for projects with large state spaces. This paper draws on the author’s prior results on sufficient indexability conditions and an adaptive-greedy algorithmic scheme for restless bandits to obtain a new fast-pivoting algorithm that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations. This algorithm also draws on the parametric simplex method, and is based on elucidating the pattern of parametric simplex tableaux, which allows to exploit special structure to substantially simplify and reduce the complexity of simplex pivoting steps. A numerical study demonstrates substantial runtime speed-ups versus alternative algorithms.

Highlights

  • We consider a general two-action (1: engaged/active; 0: rested/passive) semi-Markov decision process (SMDP) restless bandit model (see, for example, ([1], Ch. 11) and [2]) of a dynamic and stochastic project, whose state X (t) moves over continuous time t ∈ [0, ∞) across the state space N, which is assumed finite consisting of n, |N | states

  • In more complex models in which the Whittle index cannot be evaluated in closed form, the most widespread approach, which has its roots in the calibration method for the Gittins index in [49], is to apply an iterative procedure for approximately computing the index

  • This paper has presented a new algorithm for computing the Whittle index of a general finite-state semi-Markov restless bandit, based on an efficient implementation of the adaptive-greedy algorithmic scheme introduced in [3,24] for restless bandits, in which it was not specified how to evaluate certain metrics arising in the algorithm description

Read more

Summary

Introduction

We consider a general two-action (1: engaged/active; 0: rested/passive) semi-Markov decision process (SMDP) restless bandit model (see, for example, ([1], Ch. 11) and [2]) of a dynamic and stochastic project, whose state X (t) moves over continuous time t ∈ [0, ∞) across the state space N , which is assumed finite consisting of n , |N | states. The author has introduced and developed in [3,24] a methodology to establish indexability and compute the Whittle index for general finite-state restless bandits, extended to the semi-Markov denumerable-state case in [4] and to the continuous-state case in [25] The effectiveness of such an approach, based on verification of so-called PCL-indexability conditions—as they are grounded on satisfaction by project performance metrics of partial conservation laws (PCLs)—has been demonstrated in diverse models.

Review of Related Literature
SMDP Restless Bandits and Their Discrete-Stage Reformulation
Indexability
PCL-Indexability and Adaptive-Greedy Algorithm
Optimality Equations and Parametric LP Formulation
Computing the Initial Tableau
Extension to the Average Criterion
Numerical Experiments
Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call