Abstract

We propose an asymptotically optimal heuristic, which we termed the Randomized Assignment Control (RAC) for restless multi-armed bandit problems with discrete-time and fi nite states. It is based on a linear programming relaxation to the original stochastic control formulation. In contrast to most of the existing literature, we consider a fi nite horizon with multiple actions and time-dependent (i.e. non-stationary) upper bound on the total number of bandits that can be activated each time period. The asymptotic setting is obtained by letting the number of bandits and other related parameters grow to in finity. Our main contribution is that the asymptotic optimality of RAC in this general setting does not require indexability properties or the usual stability conditions of the underlying Markov chain (e.g. unichain) or fluid approximation (e.g. global stable attractor). Moreover, our multi-action setting is not restricted to the usual dominant action concept. Numerical simulations con firms that our proposed policy indeed performs well in the asymptotic setting. Perhaps more surprisingly, these simulations show that RAC performs well in the non-asymptotic setting as well. Finally, we show that RAC is asymptotically optimal for a dynamic population, where bandits can randomly arrive and depart the system, and discuss how our framework extends to more general costs and constraints.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.