Abstract

With an unprecedented growth in the biomedical literature, keeping up to date with the new developments presents an immense challenge. Publications are often studied in isolation of the established literature, with interpretation being subjective and often introducing human bias. With ontology-driven annotation of biomedical data gaining popularity in recent years and online databases offering metatags with rich textual information, it is now possible to automatically text-mine ontological terms and complement the laborious task of manual management, interpretation, and analysis of the accumulated literature with downstream statistical analysis. In this paper, we have formulated an automated workflow through which we have identified ontological information, including nutrition-related terms in PubMed abstracts (from 1991 to 2016) for two main types of Inflammatory Bowel Diseases: Crohn’s Disease and Ulcerative Colitis; and two other gastrointestinal (GI) diseases, namely, Coeliac Disease and Irritable Bowel Syndrome. Our analysis reveals unique clustering patterns as well as spatial and temporal trends inherent to the considered GI diseases in terms of literature that has been accumulated so far. Although automated interpretation cannot replace human judgement, the developed workflow shows promising results and can be a useful tool in systematic literature reviews. The workflow is available at https://github.com/KociOrges/pytag.

Highlights

  • The volume of biomedical literature in electronic format has grown exponentially over the past few years (Hunter & Cohen, 2006)

  • We propose a workflow to annotate journal abstracts from nutritionrelated literature relevant to two main types of Inflammatory Bowel Diseases (IBDs), namely, Crohn’s Disease (CD) and Ulcerative Colitis (UC); and two other Gastrointestinal (GI) conditions, Coeliac Disease (CCD) and Irritable Bowel Syndrome (IBS) where it was assumed a priori these will stand out in terms of nutrition-related terms from the former

  • Ontological terms clustered IBD separately from non-IBD conditions with temporal changes observed in the literature of each disease group When the composition of the ontological terms for the disease conditions was assessed using non-metric multidimensional scaling (NMDS) plots, findings demonstrated an evident clustering of IBD related ontological terms distinct from non-IBD (Fig. 3A)

Read more

Summary

Introduction

The volume of biomedical literature in electronic format has grown exponentially over the past few years (Hunter & Cohen, 2006). RISmed (https://github.com/skoval/ RISmed), an R package, is suggested for extracting bibliographic content from NCBI databases including PubMed it does not provide any ontology-based text-mining. In both cases, there is a lack of emphasis on the downstream statistical analyses. Using the same principle that we applied to sequencing data, in this paper, we developed a new workflow that automatically annotates PubMed abstracts with rich ontological terms This can be applied to any disease conditions, as well as allowing the user to perform the same search longitudinally, to highlight changes in a particular area. Downstream data analysis employing ecological statistics is performed to allow the investigator to interrogate patterns in the context of ontological terms and identify differences between chosen disease groups as well as secular developments within each of these

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call