Abstract

Thanks to recent efforts by the text mining community, biocurators have now access to plenty of good tools and Web interfaces for identifying and visualizing biomedical entities in literature. Yet, many of these systems start with a PubMed query, which is limited by strong Boolean constraints. Some semantic search engines exploit entities for Information Retrieval, and/or deliver relevance-based ranked results. Yet, they are not designed for supporting a specific curation workflow, and allow very limited control on the search process. The Swiss Institute of Bioinformatics Literature Services (SIBiLS) provide personalized Information Retrieval in the biological literature. Indeed, SIBiLS allow fully customizable search in semantically enriched contents, based on keywords and/or mapped biomedical entities from a growing set of standardized and legacy vocabularies. The services have been used and favourably evaluated to assist the curation of genes and gene products, by delivering customized literature triage engines to different curation teams. SIBiLS (https://candy.hesge.ch/SIBiLS) are freely accessible via REST APIs and are ready to empower any curation workflow, built on modern technologies scalable with big data: MongoDB and Elasticsearch. They cover MEDLINE and PubMed Central Open Access enriched by nearly 2 billion of mapped biomedical entities, and are daily updated.

Highlights

  • It has been repeatedly stated in the last decade that biocurators needautomated support from text mining technologies for managing the growing amount of biomedical knowledge described in the scientific literature [1,2]

  • All occurrences of ‘hepatocellular cancer’ or ‘hepatic neoplasm’ in a text can be identified and normalized with the unique MeSH concept ‘D008113: Liver Neoplasms’. These annotations are highlighted for readers in the dedicated search engine Web interface, and can often be used for complementing keywords in the search process for a better recall – as all synonyms will be mapped under a unique concept

  • We present the Swiss Institute of Bioinformatics Literature Services (SIBiLS), which aims at providing precision Information Retrieval in the biological literature

Read more

Summary

Introduction

It has been repeatedly stated in the last decade that biocurators need (semi-)automated support from text mining technologies for managing the growing amount of biomedical knowledge described in the scientific literature [1,2]. Parsed contents and annotations for MEDLINE citations and PMC full texts are stored in a JATS BioC json format, and accessible via the fetch APIs. They are indexed in Lucene Elasticsearch search engines. SIBiLS are ready to empower literature triage, and to be efficiently integrated in any curation workflow, built on modern technologies scalable with big data: MongoDB and Lucene Elasticsearch. The json document representations are stored in a MongoDB database, ready to be accessed by the automatic annotation tool and the search engine.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call