Abstract

The generalization accuracy of machine learning models of potential energy surfaces (PES) and force fields (FF) for large polyatomic molecules can be improved either by increasing the number of training points or by improving the models. In order to build accurate models based on expensive ab initio calculations, much of recent work has focused on the latter. In particular, it has been shown that gradient domain machine learning (GDML) models produce accurate results for high-dimensional molecular systems with a small number of ab initio calculations. The present work extends GDML to models with composite kernels built to maximize inference from a small number of molecular geometries. We illustrate that GDML models can be improved by increasing the complexity of underlying kernels through a greedy search algorithm using Bayesian information criterion as the model selection metric. We show that this requires including anisotropy into kernel functions and produces models with significantly smaller generalization errors. The results are presented for ethanol, uracil, malonaldehyde and aspirin. For aspirin, the model with composite kernels trained by forces at 1000 randomly sampled molecular geometries produces a global 57-dimensional PES with the mean absolute accuracy 0.177 kcal mol−1 (61.9 cm−1) and FFs with the mean absolute error 0.457 kcal mol−1 Å−1.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call