Bio-semantic relation extraction with attention-based external knowledge reinforcement

Zhijing Li,Chen Li,Yuchen Lian,Xiangrong Zhang,Xiaoyong Ma

doi:10.1186/s12859-020-3540-8

Zhijing Li, Chen Li + Show 3 more

Open Access

https://doi.org/10.1186/s12859-020-3540-8

Copy DOI

Journal: BMC bioinformatics	Publication Date: May 24, 2020
Citations: 11	License type: open-access

Affiliation: Xi'an Jiaotong University, Xidian University

Abstract

BackgroundSemantic resources such as knowledge bases contains high-quality-structured knowledge and therefore require significant effort from domain experts. Using the resources to reinforce the information retrieval from the unstructured text may further exploit the potentials of such unstructured text resources and their curated knowledge.ResultsThe paper proposes a novel method that uses a deep neural network model adopting the prior knowledge to improve performance in the automated extraction of biological semantic relations from the scientific literature. The model is based on a recurrent neural network combining the attention mechanism with the semantic resources, i.e., UniProt and BioModels. Our method is evaluated on the BioNLP and BioCreative corpus, a set of manually annotated biological text. The experiments demonstrate that the method outperforms the current state-of-the-art models, and the structured semantic information could improve the result of bio-text-mining.ConclusionThe experiment results show that our approach can effectively make use of the external prior knowledge information and improve the performance in the protein-protein interaction extraction task. The method should be able to be generalized for other types of data, although it is validated on biomedical texts.

Highlights

Semantic resources such as knowledge bases contains high-qualitystructured knowledge and require significant effort from domain experts
We try to search for more relevant Knowledge base (KB) information; there are still some entities whose information cannot be found in the two KBs
We can see that the results of entity extraction have increased by 4.06% in Bidirectional Long ShortTerm Memory neural network (BiLSTM) with the UniProtKB and BioModels data compared to the one without any external information

Summary

Introduction

Semantic resources such as knowledge bases contains high-qualitystructured knowledge and require significant effort from domain experts. We propose a novel approach to bring semantic information in the specialized knowledge bases (KBs) into the extraction of biological relations from the unstructured texts. Hua and Quan [4] extracted the PPI relation by using the shortest dependency path-based convolutional neural network (CNN) model. Their model makes use of the pre-trained word embedding for the PPI relation extraction task and could extract crucial features automatically. The BioCreative III Workshop has several tasks that focus on text mining in biology, including two PPI tasks [5]. The goal of BioCreAtIvE Critical Assessment of Information Extraction in Biology is to provide tasks focus on the prediction of protein interactions from biological articles [6]

Methods

Results

Discussion

Conclusion