Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem

Shie Mannor,John N Tsitsiklis

doi:10.1007/978-3-540-45167-9_31

Lower Bounds on the Sample Complexity of Exploration in the Multi-armed Bandit Problem

Shie Mannor, John N Tsitsiklis

https://doi.org/10.1007/978-3-540-45167-9_31

Copy DOI

Publication Date: Jan 1, 2003

Citations: 32

Affiliation: Decision Systems (United States), Massachusetts Institute of Technology

#Bandit Problem #Probably Approximately Correct + Show 2 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We consider the Multi-armed bandit problem under the PAC (“probably approximately correct”) model. It was shown by Even-Dar et al. [5] that given n arms, it suffices to play the arms a total of\(O\big(({n}/{\epsilon^2})\log ({1}/{\delta})\big)\) times to find an e-optimal arm with probability of at least 1-δ. Our contribution is a matching lower bound that holds for any sampling policy. We also generalize the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the arms are not.

Full Text