Abstract

We consider adaptive decision-making problems where an agent optimizes a cumulative performance objective by repeatedly choosing among a finite set of options. Compared to the classical prediction-with-expert-advice set-up, we consider situations where losses are constrained and derive algorithms that exploit the additional structure in optimal and computationally efficient ways. Our algorithm and our analysis is instance dependent, that is, suboptimal choices of the environment are exploited and reflected in our regret bounds. The constraints handle general dependencies between losses (even across time), and are flexible enough to also account for a loss budget, which the environment is not allowed to exceed. The performance of the resulting algorithms is highlighted in two numerical examples, which include a nonlinear and online system identification task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call