Abstract

Recent years have witnessed an exponential growth in the number of identified interactions between biological molecules. These interactions are usually represented as large and complex networks, calling for the development of appropriated tools to exploit the functional information they contain. Random walk with restart (RWR) is the state-of-the-art guilt-by-association approach. It explores the network vicinity of gene/protein seeds to study their functions, based on the premise that nodes related to similar functions tend to lie close to each other in the networks. In this study, we extended the RWR algorithm to multiplex and heterogeneous networks. The walk can now explore different layers of physical and functional interactions between genes and proteins, such as protein-protein interactions and co-expression associations. In addition, the walk can also jump to a network containing different sets of edges and nodes, such as phenotype similarities between diseases. We devised a leave-one-out cross-validation strategy to evaluate the algorithms abilities to predict disease-associated genes. We demonstrate the increased performances of the multiplex-heterogeneous RWR as compared to several random walks on monoplex or heterogeneous networks. Overall, our framework is able to leverage the different interaction sources to outperform current approaches. Finally, we applied the algorithm to predict candidate genes for the Wiedemann-Rautenstrauch syndrome, and to explore the network vicinity of the SHORT syndrome. The source code is available on GitHub at: https://github.com/alberto-valdeolivas/RWR-MH. In addition, an R package is freely available through Bioconductor at: http://bioconductor.org/packages/RandomWalkRestartMH/. Supplementary data are available at Bioinformatics online.

Highlights

  • Recent years have witnessed the accumulation of physical and functional interactions between biological macromolecules

  • A multiplex network is a collection of networks considered as layers, sharing the same set of nodes, but in which edges belong to different interaction categories

  • For instance, that the multiplex framework is more efficient than network aggregations to extract communities from biological networks (Didier et al, 2015)

Read more

Summary

Introduction

Recent years have witnessed the accumulation of physical and functional interactions between biological macromolecules. Protein-protein interactions (PPI) are nowadays screened at the proteome scale for many organisms, including humans, revealing thousands of physical interactions between proteins. The availability of large-scale PPI networks led to the application of graph theorybased approaches for their exploration, with the ultimate goal of extracting the knowledge they contain about cellular functioning. These methods exploit the tendency of functionally-related proteins to lie in the same network neighborhood. Clustering algorithms allow identifying communities of proteins participating in the same biological processes (Brohee and van Helden, 2006; Katsogiannou et al, 2014; Chapple et al, 2015; Arroyo et al, 2015) and guilt-by-association strategies explore topological relationships to predict protein cellular functions (Schwikowski et al, 2000)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call