Abstract

We study the Wasserstein metric to measure distances between molecules represented by the atom index dependent adjacency ‘Coulomb’ matrix, used in kernel ridge regression based supervised learning. Resulting machine learning models of quantum properties, a.k.a. quantum machine learning models exhibit improved training efficiency and result in smoother predictions of energies related to molecular distortions. We first illustrate smoothness for the continuous extraction of an atom from some organic molecule. Learning curves, quantifying the decay of the atomization energy’s prediction error as a function of training set size, have been obtained for tens of thousands of organic molecules drawn from the QM9 data set. In comparison to conventionally used metrics (L 1 and L 2 norm), our numerical results indicate systematic improvement in terms of learning curve off-set for random as well as sorted (by norms of row) atom indexing in Coulomb matrices. Our findings suggest that this metric corresponds to a favorable similarity measure which introduces index-invariance in any kernel based model relying on adjacency matrix representations.

Highlights

  • The application of machine learning (ML) to atomistic simulation has been gaining traction over recent years [1–9]

  • We study the Wasserstein metric to measure distances between molecules represented by the atom index dependent adjacency ‘Coulomb’ matrix, used in kernel ridge regression based supervised learning

  • The energies are smooth, while Coulomb matrix (CM) based Quantum Machine Learning (QML) model predictions using L1 are discontinuous

Read more

Summary

Introduction

The application of machine learning (ML) to atomistic simulation has been gaining traction over recent years [1–9]. While the details of the representation (other than uniqueness) are less crucial for artificial neural networks, the specific definition of how a chemical system is being specified is known to dramatically affect the learning efficiency of KRR based QML models. Encoding the right physics, such as translational or atom-index invariance, in the representation results in systematic reduction of quantum data needs for achieving the same pre-defined predictive accuracy [22]. This is of particular interest since QML models are typically trained within scarce data regimes due to (a) the immense computational (or experimental) cost for generating labels and (b) the tremendous scale of CCS [23, 24]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.