Optimizing network propagation for multi-omics data integration.

Konstantina Charmpi,Ronja Johnen,Andreas Beyer,Manopriya Chokkalingam,Teresa M Przytycka

doi:10.1371/journal.pcbi.1009161

Abstract

Network propagation refers to a class of algorithms that integrate information from input data across connected nodes in a given network. These algorithms have wide applications in systems biology, protein function prediction, inferring condition-specifically altered sub-networks, and prioritizing disease genes. Despite the popularity of network propagation, there is a lack of comparative analyses of different algorithms on real data and little guidance on how to select and parameterize the various algorithms. Here, we address this problem by analyzing different combinations of network normalization and propagation methods and by demonstrating schemes for the identification of optimal parameter settings on real proteome and transcriptome data. Our work highlights the risk of a 'topology bias' caused by the incorrect use of network normalization approaches. Capitalizing on the fact that network propagation is a regularization approach, we show that minimizing the bias-variance tradeoff can be utilized for selecting optimal parameters. The application to real multi-omics data demonstrated that optimal parameters could also be obtained by either maximizing the agreement between different omics layers (e.g. proteome and transcriptome) or by maximizing the consistency between biological replicates. Furthermore, we exemplified the utility and robustness of network propagation on multi-omics datasets for identifying ageing-associated genes in brain and liver tissues of rats and for elucidating molecular mechanisms underlying prostate cancer progression. Overall, this work compares different network propagation approaches and it presents strategies for how to use network propagation algorithms to optimally address a specific research question at hand.

Highlights

Modern technologies allow us to measure many biomolecular properties at high-throughput, generating so-called ‘omics data’, such us genome, transcriptome, or proteome data
Modern technologies enable the simultaneous measurement of tens of thousands of molecules in biological samples
Network propagation algorithms do not usually operate directly on A, because this would strongly bias the results in favor of nodes with many neighbors and because convergence cannot be guaranteed for all values of the smoothing parameter

Summary

Introduction

Modern technologies allow us to measure many biomolecular properties at high-throughput, generating so-called ‘omics data’, such us genome-, transcriptome-, or proteome data. Whereas technical progress constantly improves the sensitivity and coverage of these methods, the analysis and interpretation of this data still suffers from technical noise and biological variation. The integration of data across ‘omics layers’ and/or across different individuals remains a challenge for computational biology. Network propagation ( called network smoothing) is a class of computational methods for addressing these problems by integrating such omics data with a priori known molecular relationships (e.g. protein-protein interaction maps). A particular strength of network propagation is the fact that prior knowledge is utilized for the analysis of new data, which potentially helps increasing the signal-to-noise ratio and which aids the mechanistic interpretation of results. Within the realm of molecular biology, network propagation has a wide range of applications such as imputation of missing values [1,2,3], protein function prediction [4], inferring condition- altered sub-networks [5], and prioritization of disease genes [6,7]

Methods

Results

Discussion

Conclusion