Abstract

We study the {0,1}-loss version of adaptive adversarial multi-armed bandit problems with α(≥1) lossless arms. For the problem, we show a tight bound K−α−Θ(1/T) on the minimax expected number of mistakes (1-losses), where K is the number of arms and T is the number of rounds.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call