Abstract
BackgroundSemantic resources such as knowledge bases contains high-quality-structured knowledge and therefore require significant effort from domain experts. Using the resources to reinforce the information retrieval from the unstructured text may further exploit the potentials of such unstructured text resources and their curated knowledge.ResultsThe paper proposes a novel method that uses a deep neural network model adopting the prior knowledge to improve performance in the automated extraction of biological semantic relations from the scientific literature. The model is based on a recurrent neural network combining the attention mechanism with the semantic resources, i.e., UniProt and BioModels. Our method is evaluated on the BioNLP and BioCreative corpus, a set of manually annotated biological text. The experiments demonstrate that the method outperforms the current state-of-the-art models, and the structured semantic information could improve the result of bio-text-mining.ConclusionThe experiment results show that our approach can effectively make use of the external prior knowledge information and improve the performance in the protein-protein interaction extraction task. The method should be able to be generalized for other types of data, although it is validated on biomedical texts.
Highlights
Semantic resources such as knowledge bases contains high-qualitystructured knowledge and require significant effort from domain experts
We try to search for more relevant Knowledge base (KB) information; there are still some entities whose information cannot be found in the two KBs
We can see that the results of entity extraction have increased by 4.06% in Bidirectional Long ShortTerm Memory neural network (BiLSTM) with the UniProtKB and BioModels data compared to the one without any external information
Summary
Semantic resources such as knowledge bases contains high-qualitystructured knowledge and require significant effort from domain experts. We propose a novel approach to bring semantic information in the specialized knowledge bases (KBs) into the extraction of biological relations from the unstructured texts. Hua and Quan [4] extracted the PPI relation by using the shortest dependency path-based convolutional neural network (CNN) model. Their model makes use of the pre-trained word embedding for the PPI relation extraction task and could extract crucial features automatically. The BioCreative III Workshop has several tasks that focus on text mining in biology, including two PPI tasks [5]. The goal of BioCreAtIvE Critical Assessment of Information Extraction in Biology is to provide tasks focus on the prediction of protein interactions from biological articles [6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have