Abstract

AbstractWe give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this problem we are given a bounded set S ⊆ ℝn of feasible points. At each time step t, the online algorithm must select a point x t ∈ S while simultaneously an adversary selects a cost vector C t ∈ ℝn. The algorithm then incurs cost c t.x t. Kalai and Vempala show that even if S is exponentially large (or infinite), so long as we have an efficient algorithm for the offline problem (given c ∈ ℝn, find x ∈ S to minimize c.x) and so long as the cost vectors are bounded, one can efficiently solve the online problem of performing nearly as well as the best fixed x∈ S in hindsight. The Kalai-Vempala algorithm assumes that the cost vectors c t are given to the algorithm after each time step. In the “bandit” version of the problem, the algorithm only observes its cost, c t.x t. Awerbuch and Kleinberg [2] give an algorithm for the bandit version for the case of an oblivious adversary, and an algorithm that works against an adaptive adversary for the special case of the shortest path problem. They leave open the problem of handling an adaptive adversary in the general case. In this paper, we solve this open problem, giving a simple online algorithm for the bandit problem in the general case in the presence of an adaptive adversary. Ignoring a (polynomial) dependence on n, we achieve a regret bound of \(\mathcal{O}(T^{\frac{3}{4}}\sqrt{ln(T)}))\).KeywordsOnline AlgorithmShort Path ProblemCost VectorOnline OptimizationBandit ProblemThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call