Abstract

The diverse and growing omics data in public domains provide researchers with tremendous opportunity to extract hidden, yet undiscovered, knowledge. However, the vast majority of archived data remain unused. Here, we present MetaOmGraph (MOG), a free, open-source, standalone software for exploratory analysis of massive datasets. Researchers, without coding, can interactively visualize and evaluate data in the context of its metadata, honing-in on groups of samples or genes based on attributes such as expression values, statistical associations, metadata terms and ontology annotations. Interaction with data is easy via interactive visualizations such as line charts, box plots, scatter plots, histograms and volcano plots. Statistical analyses include co-expression analysis, differential expression analysis and differential correlation analysis, with significance tests. Researchers can send data subsets to R for additional analyses. Multithreading and indexing enable efficient big data analysis. A researcher can create new MOG projects from any numerical data; or explore an existing MOG project. MOG projects, with history of explorations, can be saved and shared. We illustrate MOG by case studies of large curated datasets from human cancer RNA-Seq, where we identify novel putative biomarker genes in different tumors, and microarray and metabolomics data from Arabidopsis thaliana. MOG executable and code: http://metnetweb.gdcb.iastate.edu/ and https://github.com/urmi-21/MetaOmGraph/.

Highlights

  • Public data repositories store petabytes of raw and processed data produced using microarray [1], RNA-seq [2], and mass spectrometry for small molecules [3] and proteins [4]

  • Out of the 111 genes we identified as increasing during progression of Kidney renal clear cell carcinoma (KIRC) or Kidney renal papillary cell carcinoma (KIRP), 56 have been described as unfavourable prognostic for renal cancer by The Human Protein Atlas (THPA) (Supplementary Table S30)

  • Out of the 79 genes we identified as decreasing with cancer progression in KIRC or KIRP, 39 were labeled as prognostic favourable for renal cancer by THPA (Supplementary Table S30)

Read more

Summary

Introduction

Public data repositories store petabytes of raw and processed data produced using microarray [1], RNA-seq [2], and mass spectrometry for small molecules [3] and proteins [4] These data represent multiple species, tissues, genotypes, and conditions; some are the results of groundbreaking research. Integrative analysis of data from the multiple studies representing diverse biological conditions is key to fully exploit these vast data resources for scientific discovery [5, 6] Such analysis allows efficient reuse and recycling of these available data and its metadata [1, 5, 7, 8].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call