Abstract
We study the {0,1}-loss version of adaptive adversarial multi-armed bandit problems with α(≥1) lossless arms. For the problem, we show a tight bound K−α−Θ(1/T) on the minimax expected number of mistakes (1-losses), where K is the number of arms and T is the number of rounds.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have