Abstract

Bayesian Optimization has been widely used along with Gaussian Processes for solving expensive-to-evaluate black-box optimization problems. Overall, this approach has shown good results, and particularly for parameter tuning of machine learning algorithms. Nonetheless, Bayesian Optimization has to be also configured to achieve the best possible performance, being the selection of the kernel function a crucial choice. This paper investigates the convenience of adaptively changing the kernel function during the optimization process, instead of fixing it a priori. Six adaptive kernel selection strategies are introduced and tested in well-known synthetic and real-world optimization problems. In order to provide a more complete evaluation of the proposed kernel selection variants, two major kernel parameter setting approaches have been tested. According to our results, apart from having the advantage of removing the selection of the kernel out of the equation, adaptive kernel selection criteria show a better performance than fixed-kernel approaches.

Highlights

  • In many machine learning algorithms, parameters need to be fine-tuned in order to guarantee good performance

  • WORK While Bayesian Optimization (BO) along with Gaussian Processes (GPs) is acknowledged to be an efficient approach to deal with optimization problems with a limited budget of function evaluations, the question of kernel selection remains one of its main limitations

  • In this paper we have proposed and analyzed different strategies for adaptive kernel selection, showing, based on an extensive experimentation and a statistical analysis of the results, that automatic selection of kernels is a more efficient approach when considering optimization problems with different characteristics

Read more

Summary

Introduction

In many machine learning algorithms, parameters need to be fine-tuned in order to guarantee good performance. Finding the best parameter set by executing the machine learning algorithm and observing the result can be seen as a nonlinear function optimization problem. It can be considered that each parameter set is a point or solution (x) in the search space, and the error of the learning process is the outcome of the objective function (f (x)). Every time the objective function is evaluated, this prior belief is updated with the likelihood of having those observations, generating a posterior distribution over functions. This updating process is based on the Bayes theorem, where the posterior probability of a model given some evidence is proportional to the prior probability of the model multiplied by the likelihood of the evidence given the model.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call