Abstract

The body of biomedical literature is growing at an unprecedented rate, exceeding the ability of researchers to make effective use of this knowledge-rich amount of information. This growth has created interest in biomedical relation extraction approaches to extract domain-specific knowledge for diverse applications. Despite the great progress in the techniques, the retrieved evidence still needs to undergo a time-consuming manual curation process to be truly useful. Most relation extraction systems have been conceived in the context of Shared Tasks, with the goal of maximizing the F1 score on restricted, domain-specific test sets. However, in industrial applications relations typically serve as input to a pipeline of biologically driven analyses; as a result, highly precise extractions are central for cutting down the manual curation effort, thus to translate the research evidence into practice smoothly and reliably. In this paper, we present a highly precise relation extraction system designed to reduce human curation efforts. The engine is made up of sophisticated rules that leverage linguistic aspects of the texts rather than sticking on application-specific training data. As a result, the system could be applied to diverse needs. Experiments on gold-standard corpora show that the system achieves the highest precision compared with previous rule-based, kernel-based, and neural approaches, while maintaining a F1 score comparable or superior to other methods. To show the usefulness of our approach in industrial scenarios, we finally present a case study on the mTOR pathway, showing how it could be applied on a large-scale.

Highlights

  • In the last 30 years we have positively observed a rapidly growing body of biomedical literature

  • We evaluate our relation extraction method on different benchmark corpora annotated for biomedical relations: LLL

  • LLL is a corpus about the model bacterium Bacillus subtilis, focused on gene transcription and sporulation; HPRD50 is about regulatory relations, direct physical interactions and modifications on documents from the Human Protein Reference Database [38]; and IEPA is a corpus focused on interactions between a restricted set of biochemicals

Read more

Summary

Introduction

In the last 30 years we have positively observed a rapidly growing body of biomedical literature As a consequence, it is more and more difficult for researchers to keep pace with the advances in their fields. It has been recently shown that one would have to examine 27 papers per day from 130 previously scanned journals to stay up to date with the literature about a single, specific disease [1]. Such a large volume of written biomedical knowledge is becoming increasingly available in the form of electronic data resources such as digital libraries and biomedical databases. Since researchers struggle to cope with this amount of data, the development of effective biomedical text mining systems has become increasingly important to allow them to dig through undiscovered knowledge

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.