Abstract

Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as “grinding” and “heating”, “dissolving” and “centrifuging”, etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.

Highlights

  • Over the last 30 years, advances in computational materials science have led to tremendous successes in materials design, with dozens of computationally designed novel compounds,[1,2] and on-demand availability of ab initio predicted properties.[3]the materials discovery pipeline remains bottlenecked by the challenges of experimental synthesis, which can require months of trial-and-error before a novel compound can be made

  • Current approaches toward understanding and predicting materials synthesis have involved in situ X-ray diffraction (XRD) investigations,[5,6] ab initio thermodynamic modeling,[7,8,9] classical thermodynamics perspectives,[4] and machine-learning guided synthesis parameters search.[10,11]

  • We applied latent Dirichlet allocation (LDA) to identify topics of synthesis from the scientific literature, and we demonstrate that the topical grouping is closely related to conventional experimental classification of synthesis steps

Read more

Summary

Introduction

Over the last 30 years, advances in computational materials science have led to tremendous successes in materials design, with dozens of computationally designed novel compounds,[1,2] and on-demand availability of ab initio predicted properties.[3]the materials discovery pipeline remains bottlenecked by the challenges of experimental synthesis, which can require months of trial-and-error before a novel compound can be made. Current approaches toward understanding and predicting materials synthesis have involved in situ X-ray diffraction (XRD) investigations,[5,6] ab initio thermodynamic modeling,[7,8,9] classical thermodynamics perspectives,[4] and machine-learning guided synthesis parameters search.[10,11] Recently, exciting applications of machine-learning methods to retrosynthesis in organic chemistry are proving to be impactful,[12,13,14] inspiring the application of similar methods to predict inorganic materials synthesis These machine-learning investigations of organic chemistry synthesis reactions have been enabled by organic chemistry reaction databases, such as Reaxys, which include >12 million single-step reactions. Even limited databases of materials synthesis reactions can yield valuable insights on the relationships between synthesis parameters and reaction products, as for example exemplified by Kim et al.[15,16,17] and others.[11,18]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.