Abstract
Numerical optimization problems are at the core of many real-world applications in which the function to be optimized stems from a proprietary and computationally intensive simulation software. It is then preferable to handle the problem as a black-box optimization and to approximate the objective function by a surrogate. Among the methods developed for solving such problems, the Efficient Global Optimization (EGO) algorithm is regarded as a state-of-the-art algorithm. The surrogate model used in EGO is a Gaussian Process (GP) conditional on data points where the value of the objective function has already been calculated. The most important control on the efficiency of the EGO algorithm is the Gaussian process covariance function (or kernel) as it customizes the GP that is processed to create the optimization iterates. Traditionally, a parameterized family of covariance functions (e.g., squared exponential, Matern) is considered whose parameters are often estimated by maximum likelihood. However, the effect of these parameters on the performance of EGO has not been properly studied and needs further investigation. In this paper, we theoretically and empirically analyze the effect of the covariance parameters, the so-called “characteristic length-scale” and “nugget”, on the design of experiments generated by EGO and the associated optimization performance. More precisely, the behavior of EGO algorithms when the covariance parameters are fixed is compared to the standard setting where they are estimated, with a special focus on the case of very small or very large characteristic length-scale. The approach allows a deeper understanding of the influence of these parameters on the EGO iterates and addresses from a mixed practical/theoretical point of view questions that are relevant for EGO users. For instance, our numerical experiments show that choosing a “small” nugget should be preferred to its estimate by maximum likelihood. We prove that iterates stay at the best observed point when the length-scale tends to 0. Vice versa, when the length-scale tends to infinity, we prove that EGO degenerates into a minimization of the GP mean prediction which, itself, tends to the Lagrange interpolation polynomial if the GP kernel is sufficiently differentiable. Overall, this study contributes to a better understanding of a key optimization algorithm, EGO.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.