Gradient-Enhanced Bayesian Optimization via Acquisition Ensembles with Application to Reinforcement Learning

Georgios Makrygiorgos,Ali Mesbah,Joel A Paulson

doi:10.1016/j.ifacol.2023.10.1639

Georgios Makrygiorgos, Ali Mesbah + Show 1 more

Open Access

https://doi.org/10.1016/j.ifacol.2023.10.1639

Copy DOI

Abstract

Bayesian optimization (BO) has shown great promise as a data-efficient strategy for the global optimization of expensive, black-box functions in a plethora of control applications. Traditional BO is derivative-free, as it solely relies on observations of a performance function to find its optimum. Recently, so-called first-order BO methods have been proposed that additionally exploit gradient information of the performance function to accelerate convergence. First-order BO methods mostly utilize standard acquisition functions, while indirectly using gradient information in the kernel structure to learn more accurate probabilistic surrogates for the performance function. In this work, we present a gradient-enhanced BO method that directly exploits performance function (zeroth-order) and its corresponding gradient (first-order) evaluations in the acquisition function. To this end, a novel gradient-based acquisition function is proposed that can identify stationary points of the performance optimization problem. We then leverage ideas from multi-objective optimization to develop an effective strategy for finding query points that optimally tradeoff between a zeroth-order acquisition function and the proposed gradient-based acquisition function. We show how the proposed acquisition-ensemble gradient-enhanced BO (AEGEBO) method enables accelerating convergence of policy-based reinforcement learning by combining noisy observations of the reward function and its gradient that can be directly estimated from closed-loop data. The performance of AEGBO is compared to standard BO and the well-known REINFORCE algorithm on a benchmark LQR problem, for which we consistently observe significantly improved performance over a limited data budget.

Full Text