Abstract

We study the problem of predicting individual sequences with linear loss with full and partial (or bandit) feed- back. Our main contribution is the first efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal Õ(√(T)) regret. In addition, for the full-information setting, we give a novel regret minimization algorithm. These results are made possible by the introduction of interior-point methods for convex optimization to online learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call