Gene network interconnectedness and the generalized topological overlap measure.

Andy M Yip,Steve Horvath

doi:10.1186/1471-2105-8-22

Abstract

BackgroundNetwork methods are increasingly used to represent the interactions of genes and/or proteins. Genes or proteins that are directly linked may have a similar biological function or may be part of the same biological pathway. Since the information on the connection (adjacency) between 2 nodes may be noisy or incomplete, it can be desirable to consider alternative measures of pairwise interconnectedness. Here we study a class of measures that are proportional to the number of neighbors that a pair of nodes share in common. For example, the topological overlap measure by Ravasz et al. [1] can be interpreted as a measure of agreement between the m = 1 step neighborhoods of 2 nodes. Several studies have shown that two proteins having a higher topological overlap are more likely to belong to the same functional class than proteins having a lower topological overlap. Here we address the question whether a measure of topological overlap based on higher-order neighborhoods could give rise to a more robust and sensitive measure of interconnectedness.ResultsWe generalize the topological overlap measure from m = 1 step neighborhoods to m ≥ 2 step neighborhoods. This allows us to define the m-th order generalized topological overlap measure (GTOM) by (i) counting the number of m-step neighbors that a pair of nodes share and (ii) normalizing it to take a value between 0 and 1. Using theoretical arguments, a yeast co-expression network application, and a fly protein network application, we illustrate the usefulness of the proposed measure for module detection and gene neighborhood analysis.ConclusionTopological overlap can serve as an important filter to counter the effects of spurious or missing connections between network nodes. The m-th order topological overlap measure allows one to trade-off sensitivity versus specificity when it comes to defining pairwise interconnectedness and network modules.

Highlights

Network methods are increasingly used to represent the interactions of genes and/ or proteins
We find that neighborhood analysis with GTOM2 leads to significantly better results than GTOM0 (Wilcoxon p-value = 0.034), GTOM1 (p-value = 0.015) and GTOM3 p-value = 0.02)
We consider p = 6 since we find that the resulting distance is highly related to GTOM1 in the yeast dataset

Summary

Introduction

Network methods are increasingly used to represent the interactions of genes and/ or proteins. Since the information on the connection (adjacency) between 2 nodes may be noisy or incomplete, it can be desirable to consider alternative measures of pairwise interconnectedness. We study a class of measures that are proportional to the number of neighbors that a pair of nodes share in common. The adjacency aij between nodes i and j equals 1 if the nodes are connected and 0 otherwise. BMC Bioinformatics 2007, 8:22 http://www.biomedcentral.com/1471-2105/8/22 connected to roughly the same group of genes in the network (i.e. they share the same neighborhood). If the 2 nodes connect to the same group of other nodes, they have a high 'topological overlap'. We study the properties of the topological overlap measure (TOM) and propose a generalization that enriches TOM's sensitivity to longer ranging connections between nodes

Methods

Results

Discussion

Conclusion