Abstract

In this paper, the problem of maximizing a black-box function $f:\mathcal{X}\to \mathbb{R}$ is studied in the Bayesian framework with a Gaussian Process prior. In particular, a new algorithm for this problem is proposed, and high probability bounds on its simple and cumulative regret are established. The query point selection rule in most existing methods involves an exhaustive search over an increasingly fine sequence of uniform discretizations of $\mathcal{X}$. The proposed algorithm, in contrast, adaptively refines $\mathcal{X}$ which leads to a lower computational complexity, particularly when $\mathcal{X}$ is a subset of a high dimensional Euclidean space. In addition to the computational gains, sufficient conditions are identified under which the regret bounds of the new algorithm improve upon the known results. Finally, an extension of the algorithm to the case of contextual bandits is proposed, and high probability bounds on the contextual regret are presented.

Highlights

  • We consider the problem of maximizing a function f : X → R from its noisy observations of the form yt = f + ηt, t = 1, 2, . . . , n, (1.1)where ηt is the observation noise at time t

  • We address two issues with existing approaches to the Gaussian Process (GP) bandits problem: 1. As discussed above, all the existing GP bandit algorithms which minimize the cumulative regret require solving an auxiliary optimization problem over the entire search space for selecting a query point which may be computationally infeasible, and practical implementations resort to various approximation techniques which do not come with theoretical guarantees

  • We extend our algorithm for GP bandits to the contextual GP bandits and obtain bounds on the contextual regret

Read more

Summary

Introduction

We further assume that the function f is expensive to evaluate, and we are allocated a budget of n function evaluations. This problem can be thought of as an extension of the multi-armed bandit (MAB) problem to the case of infinite (possibly uncountable) arms indexed by the set X. The goal is to design a strategy of sequentially selecting query points xt ∈ X based on the past observations {(xi, yi); 1 ≤ i ≤ t − 1} and the prior on f. As in the case of MAB with finitely many arms, the performance of any query point selection strategy is usually measured by the cumulative regret Rn: n

Objectives
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.