Abstract

In this study, we developed a natural language processing model for extracting information solely from the abstracts of literature on superconducting materials, with the aim of making predictions for materials science. Using a dataset of tagged documents (annotations) on superconductivity, the DyGIE++ framework was employed for the simultaneous extraction of the named entities, relations, and events. Additionally, a model was developed for classifying the subject material in the abstracts. After training with 1,000 annotated abstracts, the model extracted information, such as the material composition, superconducting transition temperature, doping information, and process information, automatically from 48,565 abstracts registered in the Scopus database since 1937. The numbers of extracted entries concerning superconducting materials and transition temperatures were 43,944 and 24,075, respectively, i.e. equivalent to the number of entries in the existing databases. Machine learning models were constructed to predict physical and chemical properties. For example, the superconducting transition temperatures were predicted for compositions, with a mean absolute error of 15 K. In addition, the doping information indicated that the superconducting transition temperature was correlated with the choice of dopant and doping site.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call