Abstract

We consider the restless bandits with general finite state space under partial observability with two observational models: first, the state of each bandit is not observable at all, and second, the state of each bandit is observable when it is selected. Under the assumption that the models satisfy a restart property, we prove that both models are indexable. For the first model, we derive a closed-form expression for the Whittle index. For the second model, we propose an efficient algorithm to compute the Whittle index by exploiting the qualitative properties of the optimal policy. We present detailed numerical experiments for multiple instances of machine maintenance problem. The result indicates that the Whittle index policy outperforms myopic policy and can be close to optimal in different setups.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call