Abstract

Data provides a foundation for machine learning, which has accelerated data-driven materials design. The scientific literature contains a large amount of high-quality, reliable data, and automatically extracting data from the literature continues to be a challenge. We propose a natural language processing pipeline to capture both chemical composition and property data that allows analysis and prediction of superalloys. Within 3 h, 2531 records with both composition and property are extracted from 14,425 articles, covering γ′ solvus temperature, density, solidus, and liquidus temperatures. A data-driven model for γ′ solvus temperature is built to predict unexplored Co-based superalloys with high γ′ solvus temperatures within a relative error of 0.81%. We test the predictions via synthesis and characterization of three alloys. A web-based toolkit as an online open-source platform is provided and expected to serve as the basis for a general method to search for targeted materials using data extracted from the literature.

Highlights

  • Artificial intelligence (AI)/machine learning (ML) is transforming materials research by changing the paradigm from “trial-anderror” to a data-driven methodology, thereby accelerating the discovery of new materials[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]

  • Starting with a corpus of scientific articles scraped in extensible markup language (XML), hypertext markup language (HTML) or plain text format, we preprocess the raw archived corpus to produce a complete document record and filter out irrelevant information

  • Our automated text mining pipeline to other physical properties of records, we focused on 259 cobalt-based and 73 nickel-based superalloys, including density, solidus, and liquidus temperatures, by regenerating the synonym dictionary of the property specifier based on the pre-trained word embedding model and adjusting the writing rules for the value and unit

Read more

Summary

Introduction

Artificial intelligence (AI)/machine learning (ML) is transforming materials research by changing the paradigm from “trial-anderror” to a data-driven methodology, thereby accelerating the discovery of new materials[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]. The scientific literature contains a vast amount of peer-reviewed, and largely high-quality, reliable data. Manual data extraction with expert knowledge is time-consuming and labor-intensive for the tens of thousands of articles communicated using free-flowing natural language[17]. With an ever-increasing number of new publications, maintaining and updating a database manually becomes increasingly difficult for the individual researcher. Developing methods for automatically extracting data rapidly and accurately has increasingly become a necessity

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call