Abstract

We propose an analysis of Probably Approximately Correct (PAC) identification of an ϵ-best arm in graph bandit models with Gaussian distributions. We consider finite but potentially very large bandit models where the set of arms is endowed with a graph structure, and we assume that the arms' expectations μ are smooth with respect to this graph. Our goal is to identify an arm whose expectation is at most ϵ below the largest of all means. We focus on the fixed-confidence setting: given a risk parameter δ, we consider sequential strategies that yield an ϵ-optimal arm with probability at least 1-δ. All such strategies use at least T*(μ)log(1/δ) samples, where R is the smoothness parameter. We identify the complexity term T*(μ) as the solution of a min-max problem for which we give a game-theoretic analysis and an approximation procedure. This procedure is the key element required by the asymptotically optimal Track-and-Stop strategy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call