MER: a shell script and annotation server for minimal named entity recognition and linking

Francisco M Couto,Andre Lamurias

doi:10.1186/s13321-018-0312-9

Abstract

Named-entity recognition aims at identifying the fragments of text that mention entities of interest, that afterwards could be linked to a knowledge base where those entities are described. This manuscript presents our minimal named-entity recognition and linking tool (MER), designed with flexibility, autonomy and efficiency in mind. To annotate a given text, MER only requires: (1) a lexicon (text file) with the list of terms representing the entities of interest; (2) optionally a tab-separated values file with a link for each term; (3) and a Unix shell. Alternatively, the user can provide an ontology from where MER will automatically generate the lexicon and links files. The efficiency of MER derives from exploring the high performance and reliability of the text processing command-line tools grep and awk, and a novel inverted recognition technique. MER was deployed in a cloud infrastructure using multiple Virtual Machines to work as an annotation server and participate in the Technical Interoperability and Performance of annotation Servers task of BioCreative V.5. The results show that our solution processed each document (text retrieval and annotation) in less than 3 s on average without using any type of cache. MER was also compared to a state-of-the-art dictionary lookup solution obtaining competitive results not only in computational performance but also in precision and recall. MER is publicly available in a GitHub repository (https://github.com/lasigeBioTM/MER) and through a RESTful Web service (http://labs.fc.ul.pt/mer/).

Highlights

Text has been and continues to be for humans the traditional and natural mean of representing and sharing knowledge
Lexicons The first step to participate in Technical Interoperability and Performance of annotation Servers (TIPS) was to select the data sources from which we could collect terms related with the following accepted categories: Cell line and cell type: Cellosaurus [39]; Chemical: HMDB [40], ChEBI [32] and ChEMBL [41]; Disease: Human Disease Ontology [35]; miRNA: miRBase [42]; Protein: Protein Ontology [43]; Subcellular structure: cellular component aspect of Gene Ontology [44]; Tissue and organ: tissue and organ subsets of UBERON [45]
Our server was able to handle all 319k requests received during the evaluation period, generating a total of 7.13M annotations with an average of 22.5 predictions per document (MAD)

Summary

Introduction

Text has been and continues to be for humans the traditional and natural mean of representing and sharing knowledge. MER only requires as input a lexicon in the form of a text file, in which each line contains a term representing a named-entity to recognize. If the user wants to perform entity linking, a text file containing the terms and their respective Unique Resource Identifiers (URIs) can be given as input.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: Dec 1, 2018
Citations: 27	License type: open-access

R Discovery Prime

R Discovery Prime

MER: a shell script and annotation server for minimal named entity recognition and linking

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Semi-automatic Semantic Service Annotation for SOAP and REST Web Services
Hyunkyung Yoo ... Yoomi Park
-
Hyunkyung Yoo, et. al.Hyunkyung Yoo ... Yoomi Park
01 Jan 2010
01 Jan 2010

Chapter 2: Playing with Files: Viewing, Manipulating, and Editing Text Files
Chris F A Johnson ... Jayant Varma
-
Chris F A Johnson, et. al.Chris F A Johnson ... Jayant Varma
01 Jan 2015
01 Jan 2015

Configurable web-services for biomedical document annotation
Sérgio Matos
Journal of cheminformatics | VOL. 10
Sérgio MatosSérgio Matos
01 Dec 2018
Journal of cheminformatics | VOL. 10

A Methodology for the Development of RESTful Semantic Web Services for Gene Expression Analysis.
Gabriela D A Guardia ... Avi Ma’Ayan
PloS one | VOL. 10
Gabriela D A Guardia, et. al.Gabriela D A Guardia ... Avi Ma’Ayan
24 Jul 2015
PloS one | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MER: a shell script and annotation server for minimal named entity recognition and linking

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Cheminformatics