Semi-supervised machine-learning classification of materials synthesis procedures

Haoyan Huo,Tanjin He,Vahe Tshitoyan,Ziqin Rong,Olga Kononova,Gerbrand Ceder,Wenhao Sun,Tiago Botari

doi:10.1038/s41524-019-0204-1

Abstract

Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as “grinding” and “heating”, “dissolving” and “centrifuging”, etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.

Highlights

Over the last 30 years, advances in computational materials science have led to tremendous successes in materials design, with dozens of computationally designed novel compounds,[1,2] and on-demand availability of ab initio predicted properties.[3]the materials discovery pipeline remains bottlenecked by the challenges of experimental synthesis, which can require months of trial-and-error before a novel compound can be made
Current approaches toward understanding and predicting materials synthesis have involved in situ X-ray diffraction (XRD) investigations,[5,6] ab initio thermodynamic modeling,[7,8,9] classical thermodynamics perspectives,[4] and machine-learning guided synthesis parameters search.[10,11]
We applied latent Dirichlet allocation (LDA) to identify topics of synthesis from the scientific literature, and we demonstrate that the topical grouping is closely related to conventional experimental classification of synthesis steps

Summary

Introduction

Over the last 30 years, advances in computational materials science have led to tremendous successes in materials design, with dozens of computationally designed novel compounds,[1,2] and on-demand availability of ab initio predicted properties.[3]the materials discovery pipeline remains bottlenecked by the challenges of experimental synthesis, which can require months of trial-and-error before a novel compound can be made. Current approaches toward understanding and predicting materials synthesis have involved in situ X-ray diffraction (XRD) investigations,[5,6] ab initio thermodynamic modeling,[7,8,9] classical thermodynamics perspectives,[4] and machine-learning guided synthesis parameters search.[10,11] Recently, exciting applications of machine-learning methods to retrosynthesis in organic chemistry are proving to be impactful,[12,13,14] inspiring the application of similar methods to predict inorganic materials synthesis These machine-learning investigations of organic chemistry synthesis reactions have been enabled by organic chemistry reaction databases, such as Reaxys, which include >12 million single-step reactions. Even limited databases of materials synthesis reactions can yield valuable insights on the relationships between synthesis parameters and reaction products, as for example exemplified by Kim et al.[15,16,17] and others.[11,18]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: npj Computational Materials	Publication Date: Jul 8, 2019
Citations: 99	License type: open-access

R Discovery Prime

R Discovery Prime

Semi-supervised machine-learning classification of materials synthesis procedures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: npj Computational Materials

Lead the way for us

Similar Papers

Chapter 25 - Frontier of Inorganic Synthesis and Preparative Chemistry (II)-Designed Synthesis—Inorganic Crystalline Porous Materials
J.-H Yu ... J.-Y Li
Modern Inorganic Synthetic Chemistry | VOL. -
J.-H Yu, et. al.J.-H Yu ... J.-Y Li
01 Jan 2017
Modern Inorganic Synthetic Chemistry | VOL. -

Text-mined dataset of inorganic materials synthesis recipes
Olga Kononova ... Tanjin He
Scientific Data | VOL. 6
Olga Kononova, et. al.Olga Kononova ... Tanjin He
15 Oct 2019
Scientific Data | VOL. 6

Recent progress in the synthesis of inorganic particulate materials using microfluidics
Kyoung-Ku Kang ... Chang-Soo Lee
Journal of the Taiwan Institute of Chemical Engineers | VOL. 98
Kyoung-Ku Kang, et. al.Kyoung-Ku Kang ... Chang-Soo Lee
12 Nov 2018
Journal of the Taiwan Institute of Chemical Engineers | VOL. 98

Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature
Zheren Wang ... Yuxing Fei
Scientific Data | VOL. 9
Zheren Wang, et. al.Zheren Wang ... Yuxing Fei
25 May 2022
Scientific Data | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semi-supervised machine-learning classification of materials synthesis procedures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: npj Computational Materials