Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature.

L Weston,A Jain,O Kononova,V Tshitoyan,J Dagdelen,A Trewartha,G Ceder,K A Persson

doi:10.1021/acs.jcim.9b00470

Abstract

The number of published materials science articles has increased manyfold over the past few decades. Now, a major bottleneck in the materials discovery pipeline arises in connecting new results with the previously established literature. A potential solution to this problem is to map the unstructured raw text of published articles onto structured database entries that allow for programmatic querying. To this end, we apply text mining with named entity recognition (NER) for large-scale information extraction from the published materials science literature. The NER model is trained to extract summary-level information from materials science documents, including inorganic material mentions, sample descriptors, phase labels, material properties and applications, as well as any synthesis and characterization methods used. Our classifier achieves an accuracy (f1) of 87%, and is applied to information extraction from 3.27 million materials science abstracts. We extract more than 80 million materials-science-related named entities, and the content of each abstract is represented as a database entry in a structured format. We demonstrate that simple database queries can be used to answer complex "meta-questions" of the published literature that would have previously required laborious, manual literature searches to answer. All of our data and functionality has been made freely available on our Github( https://github.com/materialsintelligence/matscholar )and website ( http://matscholar.com ), and we expect these results to accelerate the pace of future materials science discovery.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Chemical Information and Modeling	Publication Date: Jul 31, 2019
Citations: 182	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature.

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling

Lead the way for us

Similar Papers

Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science
Amalie Trewartha ... Anubhav Jain
Patterns | VOL. 3
Amalie Trewartha, et. al.Amalie Trewartha ... Anubhav Jain
01 Apr 2022
Patterns | VOL. 3

A literature-mining method of integrating text and table extraction for materials science publications
Rui Zhang ... Yuexing Han
Computational Materials Science | VOL. 230
Rui Zhang, et. al.Rui Zhang ... Yuexing Han
31 Aug 2023
Computational Materials Science | VOL. 230

A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study.
Steven S Doerstling ... Peter A Ubel
Journal of Medical Internet Research | VOL. 24
Steven S Doerstling, et. al.Steven S Doerstling ... Peter A Ubel
21 Jun 2022
Journal of Medical Internet Research | VOL. 24

AUC Maximization for Low-Resource Named Entity Recognition
Ngoc Dang Nguyen ... Wei Tan
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37
Ngoc Dang Nguyen, et. al.Ngoc Dang Nguyen ... Wei Tan
26 Jun 2023
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature.

Abstract

Talk to us

Similar Papers

More From: Journal of Chemical Information and Modeling