Mining Functional Modules in Heterogeneous Biological Networks Using Multiplex PageRank Approach.

Jun Li,Patrick X Zhao

doi:10.3389/fpls.2016.00903

Abstract

Identification of functional modules/sub-networks in large-scale biological networks is one of the important research challenges in current bioinformatics and systems biology. Approaches have been developed to identify functional modules in single-class biological networks; however, methods for systematically and interactively mining multiple classes of heterogeneous biological networks are lacking. In this paper, we present a novel algorithm (called mPageRank) that utilizes the Multiplex PageRank approach to mine functional modules from two classes of biological networks. We demonstrate the capabilities of our approach by successfully mining functional biological modules through integrating expression-based gene-gene association networks and protein-protein interaction networks. We first compared the performance of our method with that of other methods using simulated data. We then applied our method to identify the cell division cycle related functional module and plant signaling defense-related functional module in the model plant Arabidopsis thaliana. Our results demonstrated that the mPageRank method is effective for mining sub-networks in both expression-based gene-gene association networks and protein-protein interaction networks, and has the potential to be adapted for the discovery of functional modules/sub-networks in other heterogeneous biological networks. The mPageRank executable program, source code, the datasets and results of the presented two case studies are publicly and freely available at http://plantgrn.noble.org/MPageRank/.

Highlights

The advent of systems biology, which often integrates microarray- or RNA-Seq-based transcriptomics, proteomics, and metabolomics analyses, has made this an opportune time to determine how biological processes and complex phenotypes are regulated in living cells
We benchmarked the performance of our Multiplex PageRankbased method by comparing its performance against that of jActiveModule (Ideker et al, 2002), kwalks (Faust et al, 2010), and the method originally described by Ioannis et al (Cancer Genome Atlas Research Network, 2008)
We define as true positive (TP) a non-seed node that is present in both the reference pathway and in the inferred module, while a false positive (FP) was defined as a non-seed node found in the inferred module but not in the reference pathway

Summary

Introduction

The advent of systems biology, which often integrates microarray- or RNA-Seq-based transcriptomics, proteomics, and metabolomics analyses, has made this an opportune time to determine how biological processes and complex phenotypes ( called traits) are regulated in living cells. In the post-genomics era, the development of high-throughput “omics” technologies has generated vast amounts of mRNA, protein, and metabolite profiles for many eukaryotic species, and much of this “big data” has been made publicly accessible through data repositories (Pruitt and Maglott, 2001; Parkinson et al, 2005; Barrett et al, 2007; Brandao et al, 2009). “Big data” has the potential to provide unprecedented insights into the various biological processes, including trait-regulation, leading to the discovery of vast amounts of novel biological information

Methods

Results

Conclusion