Abstract

Gaussian process regression (GPR) has been finding an increased use in computational chemistry, including for applications where high accuracy of machine learning is required, such as potential energy surfaces or functionals for density functional theory. The quality of a GPR model critically depends on the choice of hyperparameters of the kernel. When the data are sparse, optimization of the hyperparameters by the commonly used methods such as maximum likelihood estimation (MLE) criterion can lead to overfitting. We show that choosing hyperparameters (in this case, the kernel length parameter and the regularization parameter) based on a criterion of the completeness of the basis in the corresponding linear regression problem is superior to MLE. We show that this is facilitated by the use of high-dimensional model representation (HDMR) whereby a low-order HDMR representation can provide reliable reference functions and allow generating large synthetic test data sets needed for basis parameter optimization even when the original data are few. This is expected to be particularly useful when fitting potential energy surfaces where a sufficiently low-order HDMR can provide a good approximation. An example of a 15-dimensional PES of UF6 is presented.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call