Abstract

High-throughput technologies have produced a large amount of experimental and biomedical data creating an urgent need for comprehensive and automated mining approaches. To meet this need, we developed SMAC (SMart Automatic Classification method): a tool to extract, prioritise, integrate and analyse biomedical and molecular data according to user-defined terms. The robust ranking step performed on Medical Subject Headings (MeSH) ensures that papers are prioritised based on specific user requirements. SMAC then retrieves any related molecular data from the Gene Expression Omnibus and performs a wide range of bioinformatics analyses to extract biological insights. These features make SMAC a robust tool to explore the literature around any biomedical topic. SMAC can easily be customised/expanded and is distributed as a Docker container (https://hub.docker.com/r/hfx320/smac) ready-to-use on Windows, Mac and Linux OS. SMAC’s functionalities have already been adapted and integrated into the Breast Cancer Now Tissue Bank bioinformatics platform and the Pancreatic Expression Database.

Highlights

  • ® The NCBI PubMed[1] is a biomedical literature-based search engine that provides data from MEDLINE, life science journals and online books

  • In order to index the large amount of stored data, the National Library of Medicine (NLM) created a controlled vocabulary thesaurus named MeSH (Medical Subject Headings)[2]

  • The aforementioned tools do not provide any kind of linkage or integration with the molecular data generated from the published studies. For these reasons we developed SMAC, a fast and automated method for collecting, prioritising, integrating and analysing biomedical data extracted from PubMed and Gene Expression Omnibus (GEO)[8]

Read more

Summary

Introduction

® The NCBI PubMed[1] is a biomedical literature-based search engine that provides data from MEDLINE , life science journals and online books. It is the largest and most widely used resource for biomedical and scientific research, with over 27 million citations for biomedical literature available currently for querying. While PubMed offers simple and fast search capabilities, it is a daunting, not to mention time-consuming, task to wade through the sea of information retrieved[3] For this reason, fast automatic extraction and integration of biological insights from biomedical literature represents a very attractive prospect. Similar to PolySearch[2], pubmed.mineR7 combines the advantages of the existing algorithms with the flexibility provided by an R package

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call