Abstract
We consider the Multi-armed bandit problem under the PAC (“probably approximately correct”) model. It was shown by Even-Dar et al. [5] that given n arms, it suffices to play the arms a total of\(O\big(({n}/{\epsilon^2})\log ({1}/{\delta})\big)\) times to find an e-optimal arm with probability of at least 1-δ. Our contribution is a matching lower bound that holds for any sampling policy. We also generalize the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the arms are not.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have