OmixLitMiner: A Bioinformatics Tool for Prioritizing Biological Leads from 'Omics Data Using Literature Retrieval and Data Mining.

Pascal Steffen,Jemma Wu,Hartmut Schlüter,Shubhang Hariharan,Mark P Molloy,Hannah Voss,Vijay Raghunath

doi:10.3390/ijms21041374

Abstract

Proteomics and genomics discovery experiments generate increasingly large result tables, necessitating more researcher time to convert the biological data into new knowledge. Literature review is an important step in this process and can be tedious for large scale experiments. An informed and strategic decision about which biomolecule targets should be pursued for follow-up experiments thus remains a considerable challenge. To streamline and formalise this process of literature retrieval and analysis of discovery based ‘omics data and as a decision-facilitating support tool for follow-up experiments we present OmixLitMiner, a package written in the computational language R. The tool automates the retrieval of literature from PubMed based on UniProt protein identifiers, gene names and their synonyms, combined with user defined contextual keyword search (i.e., gene ontology based). The search strategy is programmed to allow either strict or more lenient literature retrieval and the outputs are assigned to three categories describing how well characterized a regulated gene or protein is. The category helps to meet a decision, regarding which gene/protein follow-up experiments may be performed for gaining new knowledge and to exclude following already known biomarkers. We demonstrate the tool’s usefulness in this retrospective study assessing three cancer proteomics and one cancer genomics publication. Using the tool, we were able to corroborate most of the decisions in these papers as well as detect additional biomolecule leads that may be valuable for future research.

Highlights

Omics analyses, regardless of the underlying acquisition platform, are united by the common feature of describing large numbers of biomolecules relevant to the experimental system under investigation
Yu et al [7] and Lau et al [8] reported literature retrieval tools based on distinct keyword searches to support development of targeted workflows such as multiple-reaction-monitoring (MRM)
The objective of OmixLitMiner is to assist the researcher reduce the time spent on literature research of ‘omics-generated data by automating relevant literature retrieval and categorizing the results for follow-up analysis

Summary

Introduction

Regardless of the underlying acquisition platform, are united by the common feature of describing large numbers of biomolecules relevant to the experimental system under investigation. Various reporting strategies applying statistical frameworks are used in an attempt to draw out those biomolecules which are expected to be the most relevant, and the focus of more intense follow-up investigations This is an important step for the researcher as assimilated understanding of experimental data through subsequent literature review is arguably the most important part of a research project, and is one of the most time-consuming tasks. Yu et al [7] and Lau et al [8] reported literature retrieval tools based on distinct keyword searches (i.e., gene ontology terms) to support development of targeted workflows such as multiple-reaction-monitoring (MRM) These tools generate target lists without a priori consideration of the project background/objective and without reference to acquired ‘omics discovery data. These tools are distinctly different from our report where we have developed a ‘reverse’ solution to strategy described above [7,8]

Methods

Results

Discussion

Conclusion