Abstract
The need to efficiently find and extract information from the continuously growing biomedical literature has led to the development of various annotation tools aimed at identifying mentions of entities and relations. Many of these tools have been integrated in user-friendly applications facilitating their use by non-expert text miners and database curators. In this paper we describe the latest version of Neji, a web-services ready text processing and annotation framework. The modular and flexible architecture facilitates adaptation to different annotation requirements, while the built-in web services allow its integration in external tools and text mining pipelines. The evaluation of the web annotation server on the technical interoperability and performance of annotation servers track of BioCreative V.5 further illustrates the flexibility and applicability of this framework.
Highlights
The large amount of information and knowledge continuously produced in the biomedical domain is reflected on the number of published journal articles
The annotation service for participating in the technical interoperability and performance of annotation servers (TIPS) task was configured to run with 23 threads and was deployed on a Docker container with 32 GB of memory running on a server with 24 processing cores
We followed the procedure defined for the TIPS task [8], in which the document text is obtained from the BeCalm abstract and patent servers, and measured the time since the request was submitted to the Neji annotation service until the annotation results were returned
Summary
The large amount of information and knowledge continuously produced in the biomedical domain is reflected on the number of published journal articles. In 2017, the PubMed/MEDLINE bibliographic database contained over 26 million references to journal articles in life sciences, of which more than one million were added in that year [1] At this rate, staying updated with the current knowledge and identifying the most relevant publications and information on a given subject is a very challenging task for researchers. To accelerate the curation process, automatic information extraction tools have been developed and integrated in the curation pipeline [4] These tools apply information retrieval and ranking methods to expedite the identification of relevant literature, given particular curation requisites, and information extraction methods that identify textual mentions of entities (e.g. names of genes) or relations (e.g. interactions between a protein and a chemical).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.