Abstract

While the new paradigm of data-driven materials science has proven efficient in accelerated materials discovery, one challenge is whether the data-driven methods could deliver interpretable models that provide scientific insights in addition to accuracy. In this work, with the example of data-driven materials design for high-strength steels, we compared the efficiency of the recent Sure Independence Screening and Sparsifying Operator (SISSO) with several other conventional machine learning methods, Support Vector Regression (SVR), Decision Tree (DTe), and Gradient Boost Decision Tree (GBDT). The results show that SISSO gives interpretable and simple descriptors, while the accuracy is comparable to that of the relatively “black-box” model from SVR, GBDT, and DTe. The best SISSO descriptor was found to be scientifically consistent with that in previous studies. In addition, we show that combining with particle swarm optimization, the simple and explicit expression of the descriptor also bears advantages in reverse materials design, which is a general way for machine learning to not only predict but also tell what is the next possible action to be done.

Highlights

  • Data-driven materials science has been a new paradigm for efficient materials research and design.1–5 By learning on existing materials data from experiment and/or simulation, predictive models can be built and employed for fast prediction of promising new materials, followed by validation from experimental synthesis and characterization

  • The methods of Support Vector Regression (SVR),26,27 Decision Tree (DT),28 and Gradient Boost Decision Tree (GBDT)29,30 and the recent approach SISSO24,25 were used for model building and compared for their performance

  • Decision tree models are believed easier to understand with the model structure of treelike decisions, yet they remain hard to understand in this work, as seen in Fig. 3 for part of it

Read more

Summary

Introduction

Data-driven materials science has been a new paradigm for efficient materials research and design. By learning on existing materials data from experiment and/or simulation, predictive models can be built and employed for fast prediction of promising new materials, followed by validation from experimental synthesis and characterization. By learning on existing materials data from experiment and/or simulation, predictive models can be built and employed for fast prediction of promising new materials, followed by validation from experimental synthesis and characterization. Application of machine learning for materials design can be found in two aspects: forward prediction and reverse design. In forward prediction, first, the dataset of the target property, generated either from experiment or simulation, is collected. A proper machine learning algorithm for model optimization is selected to generate accurate models for predicting unseen data of the target property. Forward prediction is to fit the training data to generate a model and the reverse design process is to recommend the experimental candidate according to the current model.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call