Framework for automatic information extraction from research papers on nanocrystal devices.

Thaer M Dieb,Masaharu Yoshioka,Shinjiro Hara,Marcus C Newton

doi:10.3762/bjnano.6.190

Thaer M Dieb, Masaharu Yoshioka + Show 2 more

Open Access

https://doi.org/10.3762/bjnano.6.190

Copy DOI

Abstract

To support nanocrystal device development, we have been working on a computational framework to utilize information in research papers on nanocrystal devices. We developed an annotated corpus called “ NaDev” (Nanocrystal Device Development) for this purpose. We also proposed an automatic information extraction system called “NaDevEx” (Nanocrystal Device Automatic Information Extraction Framework). NaDevEx aims at extracting information from research papers on nanocrystal devices using the NaDev corpus and machine-learning techniques. However, the characteristics of NaDevEx were not examined in detail. In this paper, we conduct system evaluation experiments for NaDevEx using the NaDev corpus. We discuss three main issues: system performance, compared with human annotators; the effect of paper type (synthesis or characterization) on system performance; and the effects of domain knowledge features (e.g., a chemical named entity recognition system and list of names of physical quantities) on system performance. We found that overall system performance was 89% in precision and 69% in recall. If we consider identification of terms that intersect with correct terms for the same information category as the correct identification, i.e., loose agreement (in many cases, we can find that appropriate head nouns such as temperature or pressure loosely match between two terms), the overall performance is 95% in precision and 74% in recall. The system performance is almost comparable with results of human annotators for information categories with rich domain knowledge information (source material). However, for other information categories, given the relatively large number of terms that exist only in one paper, recall of individual information categories is not high (39–73%); however, precision is better (75–97%). The average performance for synthesis papers is better than that for characterization papers because of the lack of training examples for characterization papers. Based on these results, we discuss future research plans for improving the performance of the system.

Highlights

Nanoscale research is a rapidly progressing domain and many research papers containing experimental results have been published
We propose a framework for automatic information extraction, NaDevEx (Nanocrystal Device Automatic Information Extraction Framework) from research papers on nanocrystal devices and evaluate the system using the NaDev corpus
Domain knowledge-based features: (i) A chemical named entity feature was added using SERB-CNER (Syntactically Enhanced Rule-Based Chemical Named Entity Recognition System) that we developed to annotate chemical entities in nanocrystal device papers. (ii) A parameter identification feature was added based on a list of physical quantities: we compiled a list that contains physical properties of matter, common parameters found in nanocrystal device papers, and several keywords that usually correlate with parameters

Summary

Introduction

Nanoscale research is a rapidly progressing domain and many research papers containing experimental results have been published. Because it is a very time-consuming task to read through all related papers, several research efforts have been conducted in the nanoinformatics research domain This includes the construction of databases for sharing the experimental results [1,2,3,4,5], and the set-up of portals for sharing useful information [6,7,8,9,10,11,12]. The GENIA corpus [13] was constructed to extract biology-related information (e.g., genome, protein) and the BioCreative IV CHEMDNER corpus [14] was created to extract chemical and drug names Based on such corpora, several researchers have proposed a variety of methods for the extraction of information from research papers [15,16,17]. Only a few researchers have attempted to automatically extract information from research papers [1820] and their frameworks are explicitly focused on nanomedicine applications

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Beilstein journal of nanotechnology	Publication Date: Sep 7, 2015
Citations: 14	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Framework for automatic information extraction from research papers on nanocrystal devices.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Beilstein journal of nanotechnology

Lead the way for us

Similar Papers

Framework for automatic information extraction from research papers on nanocrystal devices
Thaer Dieb ... Masaharu Yoshioka
-
Thaer Dieb, et. al.Thaer Dieb ... Masaharu Yoshioka
01 Jul 2016
01 Jul 2016

A small samples training framework for deep Learning-based automatic information extraction: Case study of construction accident news reports analysis
Dan Feng ... Hainan Chen
Advanced Engineering Informatics | VOL. 47
Dan Feng, et. al.Dan Feng ... Hainan Chen
01 Jan 2020
Advanced Engineering Informatics | VOL. 47

Bridge inspection named entity recognition via BERT and lexicon augmented machine reading comprehension neural model
Ren Li ... Di Wang
Advanced Engineering Informatics | VOL. 50
Ren Li, et. al.Ren Li ... Di Wang
15 Sep 2021
Advanced Engineering Informatics | VOL. 50

P-078. Profile of cesarean delivery care in hypertensive pregnant women
Isabella Delgado ... Francisco Lazaro P Sousa
Pregnancy Hypertension | VOL. 25
Isabella Delgado, et. al.Isabella Delgado ... Francisco Lazaro P Sousa
01 Sep 2021
Pregnancy Hypertension | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Framework for automatic information extraction from research papers on nanocrystal devices.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Beilstein journal of nanotechnology