Abstract

Motivated by situations arising in surveillance, search and monitoring, in this paper we study dynamic allocation of assets which tend to fail, requiring replenishment before once again being available for operation on one of the available tasks. We cast the problem as a closed-system continuous-time Markov decision process with impulsive controls, maximising the long-term time-average sum of per-task reward rates. We then formulate an open-system continuous-time approximative model, whose Lagrangian relaxation yields a decomposition (innovatively extending the restless bandits approach), from which we derive the corresponding Whittle index. We propose two ways of adapting the Whittle index derived from the open-system model to the original closed-system model, a naïve one and a cleverly modified one. We carry out extensive numerical performance evaluation of the original closed-system model, which indicates that the cleverly modified Whittle index rule is nearly optimal, being within 1.6% (0.4%, 0.0%) of the optimal reward rate 75% (50%, 25%) of the time, and significantly superior to uniformly random allocation which is within 22.0% (16.2%, 10.7%) of the optimal reward rate. Our numerical results also suggest that the Whittle index must be cleverly modified when adapting it from the open-system, as the naïve Whittle index rule is not superior to a myopic greedy policy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.