Abstract
The hyperparameters in Gaussian process regression (GPR) model with a specified kernel are often estimated from the data via the maximum marginal likelihood. Due to the non-convexity of marginal likelihood with respect to the hyperparameters, the optimization may not converge to the global maxima. A common approach to tackle this issue is to use multiple starting points randomly selected from a specific prior distribution. As a result the choice of prior distribution may play a vital role in the predictability of this approach. However, there exists little research in the literature to study the impact of the prior distributions on the hyperparameter estimation and the performance of GPR. In this paper, we provide the first empirical study on this problem using simulated and real data experiments. We consider different types of priors for the initial values of hyperparameters for some commonly used kernels and investigate the influence of the priors on the predictability of GPR models. The results reveal that, once a kernel is chosen, different priors for the initial hyperparameters have no significant impact on the performance of GPR prediction, despite that the estimates of the hyperparameters are very different to the true values in some cases.
Highlights
Over the last few decades, Gaussian Processes Regression (GPR) has been proven to be a powerful and effective method for non-linear regression problems due to many desirable properties, such as ease of obtaining and expressing 5 uncertainty in predictions, the ability to capture a wide variety of behaviour through a simple parameterization, and a natural Bayesian interpretation [1]
Various empirical studies have shown that GPR can make better performance for prediction in many areas [5, 6, 7, 8] compared to some other models such as Support Vector Machine (SVM) [9, 10, 11], and a number of further developments of Gaussian process methods have been proposed, including deep Gaussian process [12] and 15 recurrent Gaussian processes [13]
The choice of kernel 20 has a profound impact on the performance of a GPR model, just as activation function, learning rate can affect the result of a neural network [14]
Summary
Over the last few decades, Gaussian Processes Regression (GPR) has been proven to be a powerful and effective method for non-linear regression problems due to many desirable properties, such as ease of obtaining and expressing 5 uncertainty in predictions, the ability to capture a wide variety of behaviour through a simple parameterization, and a natural Bayesian interpretation [1]. Most practitioners using GPR as a modelling tool tend to choose a simple prior dis tribution based on their expert opinions and experiences, such as the Uniform distribution in the range of (0, 1) [4, 17, 20] It is of importance and of interest to investigate whether the predictability of GPR models would be jeopardised if the prior distribution were not properly chosen and how the choice of prior distribution may affect the performance of GPR models [19, 20]. We consider different types of priors, including vague and data-dominated, for the initial values of hyperparameters for some commonly used kernels and investigate the influence of the priors on the predictability of GPR models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.