Abstract
DScribe is a software package for machine learning that provides popular feature transformations (“descriptors”) for atomistic materials simulations. DScribe accelerates the application of machine learning for atomistic property prediction by providing user-friendly, off-the-shelf descriptor implementations. The package currently contains implementations for Coulomb matrix, Ewald sum matrix, sine matrix, Many-body Tensor Representation (MBTR), Atom-centered Symmetry Function (ACSF) and Smooth Overlap of Atomic Positions (SOAP). Usage of the package is illustrated for two different applications: formation energy prediction for solids and ionic charge prediction for atoms in organic molecules. The package is freely available under the open-source Apache License 2.0. Program summaryProgram Title: DScribeProgram Files doi:http://dx.doi.org/10.17632/vzrs8n8pk6.1Licensing provisions: Apache-2.0Programming language: Python/C/C++Supplementary material: Supplementary Information as PDFNature of problem: The application of machine learning for materials science is hindered by the lack of consistent software implementations for feature transformations. These feature transformations, also called descriptors, are a key step in building machine learning models for property prediction in materials science.Solution method: We have developed a library for creating common descriptors used in machine learning applied to materials science. We provide an implementation the following descriptors: Coulomb matrix, Ewald sum matrix, sine matrix, Many-body Tensor Representation (MBTR), Atom-centered Symmetry Functions (ACSF) and Smooth Overlap of Atomic Positions (SOAP). The library has a python interface with computationally intensive routines written in C or C++. The source code, tutorials and documentation are provided online. A continuous integration mechanism is set up to automatically run a series of regression tests and check code coverage when the codebase is updated.
Highlights
1.1 Materials informaticsMaterials science is a multi-disciplinary research area for materials properties and applications
I give an overview of four supervised learning algorithms– kernel ridge regression, Gaussian process regression, neural networks, and decision trees – which have all been widely used in the context of applying machine learning for materials science
The research topics are divided into two categories: the materials informatics software developed in Publications V and II and the applications of machine learning and data mining for materials discovery presented in Publications III and I
Summary
Materials science is a multi-disciplinary research area for materials properties and applications. By the 1940s and 1950s, computers were beginning to have enough power to run materials-related simulations, such as Monte Carlo and molecular dynamics (MD) [9] This unleashed the power of model-based science by reformulating them as numerical problems, solved in-silico. These advances quickly amounted to the explosion of materials data in the form of recorded experiments, simulations and scientific articles. The entire life-cycle of materials data has changed significantly in a few decades: from how data is created in a high-throughput fashion, stored for long-term in organized materials databases and analyzed and exploited with advanced data analytics such as machine learning This shift towards data-driven materials science provides the backdrop for the research included in this thesis
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.