Abstract

DScribe is a software package for machine learning that provides popular feature transformations (“descriptors”) for atomistic materials simulations. DScribe accelerates the application of machine learning for atomistic property prediction by providing user-friendly, off-the-shelf descriptor implementations. The package currently contains implementations for Coulomb matrix, Ewald sum matrix, sine matrix, Many-body Tensor Representation (MBTR), Atom-centered Symmetry Function (ACSF) and Smooth Overlap of Atomic Positions (SOAP). Usage of the package is illustrated for two different applications: formation energy prediction for solids and ionic charge prediction for atoms in organic molecules. The package is freely available under the open-source Apache License 2.0. Program summaryProgram Title: DScribeProgram Files doi:http://dx.doi.org/10.17632/vzrs8n8pk6.1Licensing provisions: Apache-2.0Programming language: Python/C/C++Supplementary material: Supplementary Information as PDFNature of problem: The application of machine learning for materials science is hindered by the lack of consistent software implementations for feature transformations. These feature transformations, also called descriptors, are a key step in building machine learning models for property prediction in materials science.Solution method: We have developed a library for creating common descriptors used in machine learning applied to materials science. We provide an implementation the following descriptors: Coulomb matrix, Ewald sum matrix, sine matrix, Many-body Tensor Representation (MBTR), Atom-centered Symmetry Functions (ACSF) and Smooth Overlap of Atomic Positions (SOAP). The library has a python interface with computationally intensive routines written in C or C++. The source code, tutorials and documentation are provided online. A continuous integration mechanism is set up to automatically run a series of regression tests and check code coverage when the codebase is updated.

Highlights

  • 1.1 Materials informaticsMaterials science is a multi-disciplinary research area for materials properties and applications

  • I give an overview of four supervised learning algorithms– kernel ridge regression, Gaussian process regression, neural networks, and decision trees – which have all been widely used in the context of applying machine learning for materials science

  • The research topics are divided into two categories: the materials informatics software developed in Publications V and II and the applications of machine learning and data mining for materials discovery presented in Publications III and I

Read more

Summary

Materials informatics

Materials science is a multi-disciplinary research area for materials properties and applications. By the 1940s and 1950s, computers were beginning to have enough power to run materials-related simulations, such as Monte Carlo and molecular dynamics (MD) [9] This unleashed the power of model-based science by reformulating them as numerical problems, solved in-silico. These advances quickly amounted to the explosion of materials data in the form of recorded experiments, simulations and scientific articles. The entire life-cycle of materials data has changed significantly in a few decades: from how data is created in a high-throughput fashion, stored for long-term in organized materials databases and analyzed and exploited with advanced data analytics such as machine learning This shift towards data-driven materials science provides the backdrop for the research included in this thesis

Research objectives and structure of the thesis
Data-driven design for materials research
Materials ontologies
Data creation and veracity
Experimental data
Computational data
Data veracity
Data storage and distribution
Machine learning in materials science
Introduction to machine learning
Supervised learning
Unsupervised learning
Reinforcement learning
Motivation for using machine learning
Understanding materials phenomena
Materials discovery
Advancing materials modelling
Using machine learning in materials science
Principles of learning
The amount of training data
Supervised learning algorithms
Materials science-guided learning design
Feature engineering
Feature learning
Challenges and best practices
Materials informatics development and application
MatID: Automated structural classification and analysis
DScribe
Summary
Applications
Machine learning based screening of catalytic activity in nanoclusters
Findings
Outlook
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call