Abstract

The need to efficiently find and extract information from the continuously growing biomedical literature has led to the development of various annotation tools aimed at identifying mentions of entities and relations. Many of these tools have been integrated in user-friendly applications facilitating their use by non-expert text miners and database curators. In this paper we describe the latest version of Neji, a web-services ready text processing and annotation framework. The modular and flexible architecture facilitates adaptation to different annotation requirements, while the built-in web services allow its integration in external tools and text mining pipelines. The evaluation of the web annotation server on the technical interoperability and performance of annotation servers track of BioCreative V.5 further illustrates the flexibility and applicability of this framework.

Highlights

  • The large amount of information and knowledge continuously produced in the biomedical domain is reflected on the number of published journal articles

  • The annotation service for participating in the technical interoperability and performance of annotation servers (TIPS) task was configured to run with 23 threads and was deployed on a Docker container with 32 GB of memory running on a server with 24 processing cores

  • We followed the procedure defined for the TIPS task [8], in which the document text is obtained from the BeCalm abstract and patent servers, and measured the time since the request was submitted to the Neji annotation service until the annotation results were returned

Read more

Summary

Introduction

The large amount of information and knowledge continuously produced in the biomedical domain is reflected on the number of published journal articles. In 2017, the PubMed/MEDLINE bibliographic database contained over 26 million references to journal articles in life sciences, of which more than one million were added in that year [1] At this rate, staying updated with the current knowledge and identifying the most relevant publications and information on a given subject is a very challenging task for researchers. To accelerate the curation process, automatic information extraction tools have been developed and integrated in the curation pipeline [4] These tools apply information retrieval and ranking methods to expedite the identification of relevant literature, given particular curation requisites, and information extraction methods that identify textual mentions of entities (e.g. names of genes) or relations (e.g. interactions between a protein and a chemical).

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.