Computing an Index Policy for Bandits with Switching Penalties

José Niño-Mora

doi:10.4108/smctools.2007.1994

Abstract

We address the multiarmed bandit problem with switching penalties including both costs and delays. Asawa and Teneketzis (1996) introduced an index for bandits with switching penalties that partially characterizes optimal policies, attaching to each project state a "continuation index" (its Gittins index) and a "switching index," yet only proposed an index algorithm for the case of switching costs. We present a fast decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most (5/2)n2 + O(n) arithmetic operations for an n-state project. This extends earlier work where we introduced a two-stage index algorithm for the case of switching costs only. We exploit the fact that the Asawa and Teneketzis index is the marginal productivity index of a classic bandit with switching penalties in its semi-Markov restless reformulation, by deploying methods introduced by the author. A computational study demonstrates the dramatic runtime savings achieved by the new algorithm, the near-optimality of the index policy, and its substantial gains against the benchmark Gittins index policy across a wide range of two-and three-project instances.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Computing an Index Policy for Bandits with Switching Penalties

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Faster Index Algorithm and a Computational Study for Bandits with Switching Costs
José Niño-Mora
INFORMS Journal on Computing | VOL. 20
José Niño-MoraJosé Niño-Mora
01 May 2008
INFORMS Journal on Computing | VOL. 20

Dynamic scheduling for production systems operating in a random environment

-

01 Jan 2003
01 Jan 2003

Computing an Index Policy for Multiarmed Bandits with Deadlines
José Nino-Mora
-
José Nino-MoraJosé Nino-Mora
01 Jan 2008
01 Jan 2008

Computing an index policy for bandits with switching penalties

-

22 Oct 2007
22 Oct 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computing an Index Policy for Bandits with Switching Penalties

Abstract

Talk to us

Similar Papers