Abstract

Traditional approaches to stochastic resource allocation problems (including the classical multi-armed bandit problems) have usually made use of dynamic programming (DP) methodology, perhaps buttressed by further ad hoc arguments. While such approaches seem ‘natural’ they have usually proved technically very difficult. Bertsimas and Niño-Mora have recently given a radically new account of many important results in this area which relate to Gittins indices. The key to their approach is in the characterisation of the region of achievable performance. The optimisation problems of interest are then solved as linear programs over this region. Here we exploit elements within the Bertsimas and Niño-Mora framework (in particular, its capacity to give formulae for the total return of a given policy in closed form) to obtain (i) a simple dynamic programming proof of the optimality of Gittins index policies and (ii) a range of index-based suboptimality bounds for general policies for a variety of stochastic models for resource allocation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.