Abstract

This paper introduces the multiarmed bandit problem with deadlines, which concerns the dynamic selection of a live project to engage out of a portfolio of Markovian bandit projects expiring after given deadlines, to maximize the expected total discounted or undiscounted reward earned. Although the problem is computationally intractable, a natural heuristic policy is obtained by attaching to each project the finite-horizon counterpart of its Gittins index, and then engaging at each time a live project of highest index. Remarkably, while such a finite-horizon index was introduced in [R. N. Bradt, S. M. Johnson, and S. Karlin (1956). On sequential designs to maximize the sum of n observations. Ann. Math. Statist. 27 1060--1074], an exact polynomialtime algorithm using arithmetic operations does not seem to have been proposed until [J. Nino-Mora (2005). A marginal productivity index policy for the finite-horizon multiarmed bandit problem. In Proceedings of CDC-ECC '05, pp. 1718--1722, IEEE]. Yet, such an adaptive-greedy index algorithm, which draws on methods introduced by the author for restless bandit indexation, has a complexity of O(T3n3) operations for a T-horizon n-state project, rendering it impractical for all but small instances. This paper significantly improves on the complexity of such an algorithm, decoupling it into a recursive T-stage method that performs O(T2n3) arithmetic operations. Moreover, in an insightful special model the complexity is further reduced to O(T2) operations, and closed-form index formulae are given. Computational experiments are reported demonstrating the algorithm's runtime performance, and showing that the proposed index policy is near optimal and can substantially outperform the benchmark greedy and Gittins index policies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.