Abstract

The recent advancement of omic technologies provides researchers with the possibility to search for disease-associated biomarkers at the system level. The integrative analysis of data from a large number of molecules involved at various layers of the biological system offers a great opportunity to rank disease biomarker candidates. In this paper, we propose MOTA, a network-based method that uses data acquired at multiple layers to rank candidate disease biomarkers. The networks constructed by MOTA allow users to investigate the biological significance of the top-ranked biomarker candidates. We evaluated the performance of MOTA in ranking disease-associated molecules from three sets of multi-omic data representing three cohorts of hepatocellular carcinoma (HCC) cases and controls with liver cirrhosis. The results demonstrate that MOTA allows the identification of more top-ranked metabolite biomarker candidates that are shared by two different cohorts compared to traditional statistical methods. Moreover, the mRNA candidates top-ranked by MOTA comprise more cancer driver genes compared to those ranked by traditional differential expression methods.

Highlights

  • Statistical and machine learning methods are commonly used in omic studies to find disease biomarker candidates based on differential expression [1,2,3,4,5,6]

  • The results show that MOTA allows the identification of more overlapping top-ranked metabolite biomarker candidates in two cohorts of the same study compared to t-test and iDINGO

  • We calculated MOTA scores for each metabolite in the GU1 metabolomic dataset by integrating it with proteomic and glycomic datasets acquired by analyzing the same set of samples

Read more

Summary

Introduction

Statistical and machine learning methods are commonly used in omic studies to find disease biomarker candidates based on differential expression [1,2,3,4,5,6]. Relevance networks are a widely used data-driven method to model biological systems due to its simplicity [8] They measure ‘relevance’ by correlation or mutual information between two biomolecules and set a threshold to determine whether they are relevant or not. This method fails to distinguish direct and indirect associations, especially when dealing with high-dimensional omic datasets. Krumsiek et al used GGM to analyze metabolomic data acquired from a large human population cohort and found that GGM generates rather sparse and robust networks compared to Pearson correlation [12] They observed that metabolites from known metabolic reactions are connected by edges with high partial correlation coefficients

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call