Abstract

We propose an asymptotically optimal heuristic, which we termed the Randomized Assignment Control (RAC) for restless multi-armed bandit problems with discrete-time and fi nite states. It is based on a linear programming relaxation to the original stochastic control formulation. In contrast to most of the existing literature, we consider a fi nite horizon with multiple actions and time-dependent (i.e. non-stationary) upper bound on the total number of bandits that can be activated each time period. The asymptotic setting is obtained by letting the number of bandits and other related parameters grow to in finity. Our main contribution is that the asymptotic optimality of RAC in this general setting does not require indexability properties or the usual stability conditions of the underlying Markov chain (e.g. unichain) or fluid approximation (e.g. global stable attractor). Moreover, our multi-action setting is not restricted to the usual dominant action concept. Numerical simulations con firms that our proposed policy indeed performs well in the asymptotic setting. Perhaps more surprisingly, these simulations show that RAC performs well in the non-asymptotic setting as well. Finally, we show that RAC is asymptotically optimal for a dynamic population, where bandits can randomly arrive and depart the system, and discuss how our framework extends to more general costs and constraints.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call