Special issue on machine learning and quantum mechanics

Matthias Rupp

doi:10.1002/qua.24955

Abstract

Models that combine quantum mechanics (QM) with machine learning (ML) have seen strong renewed interest over the last years. This is reflected in dedicated research programs, workshops, and publications such as this special issue of International Journal of Quantum Chemistry on ML and QM. In the following, I briefly outline idea and history of these models, focusing on contributions in this issue. Systematic computational design and study of molecules and materials requires rigorous, unbiased, and accurate treatment on the atomic scale. While numerical approximations to the many-electron problem have become available, their prohibitive computational cost severely limits their applicability. Based on the reasoning that electronic structure calculations of similar systems contain redundant information, ML models have been developed that interpolate between a computationally feasible number of QM reference calculations to predict properties of new similar systems. Essentially, the problem of solving the electronic Schrödinger equation is mapped onto a nonlinear statistical regression problem. Example applications include structural relaxation, molecular dynamics, and high-throughput calculation of quantum chemical properties. This Ansatz has been demonstrated to enable computational savings of up to several orders of magnitude, with accuracy on par with the reference method, in applications involving large systems, long time scales, or large numbers of systems. Interpolating QM results poses challenges that are distinctly different from those in cheminformatics (e.g., quantitative structure-property/activity relationships), where experimental results are interpolated; in particular, there is no noise as modeled properties are outcomes of deterministic procedures, and representations have to respect small changes in geometry to enable modeling of properties with high accuracy. Although interpolation techniques have been used in early QM calculations, the systematic application of methods belonging to what is today called ML started perhaps around the 1990s, an example being the fitting of eigenenergies of harmonic oscillators by artificial neural networks (ANN). In the following decade, ANNs were used for interpolation of potential energy surfaces of single systems and have since developed into powerful tools for large-scale molecular dynamics simulations. A variety of other approaches, including Shepard interpolation, cubic splines, moving least-squares, and symbolic regression, were used as well. In this issue, a tutorial review1 of ANN potentials is given by Jörg Behler, one of their major proponents; Sergei Manzhos, Richard Dawes and Tucker Carrington Jr. discuss sum-of-product ANNs related to many-body-expansions.2 Interpolation between QM results for different systems, for example, molecular property estimates, started roughly a decade later, first with ANNs, joined later by kernel-based ML methods such as support vector machines and Gaussian process regression (GPR). The reader can find a brief general introduction to kernel-based ML for QM data in my tutorial.3 GPR, sometimes known as Kriging, yields the same predictions as kernel ridge regression (KRR), although other bells and whistles like predictive variance are different. Both GPR and KRR have become popular for predictions across chemical compound space and interpolation of potential energy surfaces. In their contribution, Albert P. Bartók and Gábor Csányi take the reader on a tour through their Gaussian approximation potentials approach for potential energy surface interpolation,4 and Paul L.A. Popelier summarizes progress on the development of a GPR potential for peptides and proteins based on Quantum Chemical Topology.5 An example of thermochemical property predictions across different molecules can be found in the article by Jianming Wu, Yuwei Zhou, and Xin Xu, who use ANNs to statistically correct DFT/B3LYP predictions with respect to experimental values.6 One of the most important aspects of a QM/ML model is how a system, be it molecular or periodic, is numerically represented for interpolation. A wide variety of representations has been proposed, including symmetry functions, ad hoc descriptors, smooth overlap of atomic positions, and the Coulomb matrix. O. Anatole von Lilienfeld et al. discuss requirements on molecular representations at the example of a descriptor based on Fourier expansions of atomic radial basis functions.7 Developing representations that allow generalization across different materials has been challenging so far; Felix Faber et al. present new results on generalizations of the Coulomb matrix for periodic systems.8 On the ML side, Kevin Vu et al. present a detailed analysis9 of the workhorse model of many studies, KRR with Gaussian kernel. John Snyder et al., in a continuation of previous work on learning the kinetic energy as a functional of the electron density for an orbital-free density functional theory, describe an improved algorithm to constrain density optimization to the training data manifold.10 Scaling up ML-based large-scale molecular dynamics simulations requires dedicated methodological effort. Venkatesh Botu und Rampi Ramprasad describe dynamic model retraining (“learning on the fly”) for their models using fingerprint representations of materials.11 Marco Caccin et al. report on a framework for massively parallel QM/ML/molecular mechanics simulations with QM zones of over 1000 atoms.12 Both studies rely on learning forces on atoms directly (as opposed to using the derivative of a ML model of the energy). This special issue offers a cross section of current research on ML for QM. Some lines of research can be traced back to the Institute of Pure and Applied Mathematics (IPAM) long research program on “Navigating Chemical Compound Space for Materials and Bio Design” (March 14–June 17, 2011, Los Angeles, CA), which had a strong positive impact. Related events included the CECAM workshop on “Machine Learning in Atomistic Simulations” (September 10–12, 2012, Lugano, Switzerland), this year's IPAM workshop “Machine Learning for Many-Particle Systems” (February 23–27, 2015, Los Angeles, CA) and the CECAM/Ψk workshop “From Many-Body Hamiltonians to Machine Learning and Back” (May 11–13, 2015, Berlin, Germany); a dedicated IPAM long research program on “Understanding Many-Particle Systems with Machine Learning” is scheduled to take place next year in Los Angeles. I hope that this special issue will contribute to these efforts of further developing models combining QM with ML. I thank the Editor-in-Chief of the Int. Journal of Quantum Chemistry, Matteo Cavalleri, for providing the opportunity for this special issue and for support in seeing it through. Matthias Rupp Fritz Haber Institute of the Max Planck Society, Faradayweg 4–6, 14195, Berlin, Germany E-mail: mrupp@mrupp.info

Full Text