Fast, accurate, and transferable many-body interatomic potentials by symbolic regression

Alberto Hernandez,Adarsh Balasubramanian,Tim Mueller,Fenglin Yuan,Simon A M Mason

doi:10.1038/s41524-019-0249-1

Alberto Hernandez, Adarsh Balasubramanian + Show 3 more

Open Access

https://doi.org/10.1038/s41524-019-0249-1

Copy DOI

Journal: npj Computational Materials	Publication Date: Nov 18, 2019
Citations: 51	License type: open-access

Affiliation: Johns Hopkins University

Abstract

The length and time scales of atomistic simulations are limited by the computational cost of the methods used to predict material properties. In recent years there has been great progress in the use of machine-learning algorithms to develop fast and accurate interatomic potential models, but it remains a challenge to develop models that generalize well and are fast enough to be used at extreme time and length scales. To address this challenge, we have developed a machine-learning algorithm based on symbolic regression in the form of genetic programming that is capable of discovering accurate, computationally efficient many-body potential models. The key to our approach is to explore a hypothesis space of models based on fundamental physical principles and select models within this hypothesis space based on their accuracy, speed, and simplicity. The focus on simplicity reduces the risk of overfitting the training data and increases the chances of discovering a model that generalizes well. Our algorithm was validated by rediscovering an exact Lennard-Jones potential and a Sutton-Chen embedded-atom method potential from training data generated using these models. By using training data generated from density functional theory calculations, we found potential models for elemental copper that are simple, as fast as embedded-atom models, and capable of accurately predicting properties outside of their training set. Our approach requires relatively small sets of training data, making it possible to generate training data using highly accurate methods at a reasonable computational cost. We present our approach, the forms of the discovered models, and assessments of their transferability, accuracy and speed.

Highlights

The development of an interatomic potential model is treated as a supervised learning problem,[21] in which an optimization algorithm is used to search a hypothesis space of possible functions to find those that best reproduce the energies, forces, and possibly other properties of a set of training data
Having established that our genetic programming algorithm can find the exact form of simple pair and many-body potentials, we evaluated its ability to find potential models from data generated using density functional theory[52] (DFT)
We tested its ability to rediscover the exact form of two interatomic potentials: the Lennard-Jones potential and the Sutton-Chen (SC) embedded-atom method (EAM) potential

Summary

Introduction

There have been great advances in the use of machine learning to develop interatomic potential models.[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] In this approach, the development of an interatomic potential model is treated as a supervised learning problem,[21] in which an optimization algorithm is used to search a hypothesis space of possible functions to find those that best reproduce the energies, forces, and possibly other properties of a set of training data. Potential models developed in this way are often able to achieve accuracy close to that of the method used to generate the training data, with linear scalability and orders of magnitude increase in performance. Potential models may be generated by using fundamental physical relationships to derive a simple parameterized function. The parameters of this function are typically fit to a smaller set of training data. Examples of potential models generated using this latter approach include the embedded-atom method (EAM) and bond-order potentials.[22,23,24,25,26,27,28]

Methods

Results

Conclusion