Levy Bandits: Multi-Armed Bandits Driven by Levy Processes

Haya Kaspi,Avi Mandelbaum

doi:10.1214/aoap/1177004777

Abstract

Levy bandits are multi-armed bandits driven by Levy processes. As anticipated from existing research, Levy bandits are optimally controlled by an index strategy: One can associate with each arm an index function of its state, and optimal strategies are those that allocate time to arms whose states have the largest index. Furthermore, the index function of an arm is calculated independently of the other arms, and the optimal reward can be expressed in terms of the indices. Somewhat less anticipated, however, is the fact that the index function of an arm, driven by a Levy process, has a representation in terms of the decreasing ladder sets and the exit system of its Levy driver. Moreover, the Wiener-Hopf factorization of the Levy exponents of an arm can be used to obtain the characteristic function of some excursion law, through which the index of the arm is defined. We use this factorization to calculate explicitly index functions and optimal rewards of some interesting Levy bandits, rediscovering along the way that local time naturally quantifies switching in continuous time.

Full Text