Indexing a protein-protein interaction network expedites network alignment.

Md Mahmudul Hasan,Tamer Kahveci

doi:10.1186/s12859-015-0756-0

Abstract

BackgroundNetwork query problem aligns a small query network with an arbitrarily large target network. The complexity of this problem grows exponentially with the number of nodes in the query network if confidence in the optimality of result is desired. Scaling this problem to large query and target networks remains to be a challenge.ResultsIn this article, we develop a novel index structure that dramatically reduces the cost of the network query problem. Our index structure maintains a small set of reference networks where each reference network is a small, carefully chosen subnetwork from the target network. Along with each reference, we also store all of its non-overlapping and statistically significant alignments with the target network. Given a query network, we first align the query with the reference networks. If the alignment with a reference network yields a sufficiently large score, we compute an upper-bound to the alignment score between the query and the target using the alignments of that reference and the target (which is stored in our index). If the upper-bound is large enough, we employ a second round of alignment between the query and the target by respecting the mapping found in the first alignment.Our experiments on protein-protein interaction networks demonstrate that our index achieves a significant speed-up in running time over the state-of-the-art methods such as ColT. The alignment subnetworks obtained by our method are also statistically significant. Finally, we observe that our method finds biologically and statistically significant alignments across multiple species.ConclusionsWe developed a reference network based indexing structure that accelerates network query and produces functionally and statistically significant results.

Highlights

Network query problem aligns a small query network with an arbitrarily large target network
Depending on the interacting molecules and their interaction types, biological networks are often classified into several categories such as gene regulatory networks, signaling networks or protein-protein interaction networks
With the help of this index structure we dramatically reduce the computational cost of the network query problem

Summary

Introduction

Network query problem aligns a small query network with an arbitrarily large target network The complexity of this problem grows exponentially with the number of nodes in the query network if confidence in the optimality of result is desired. Scaling this problem to large query and target networks remains to be a challenge. Biological networks describe how different molecules (such as proteins or gene products) interact with each other to carry out various cellular functions. One common way to model such networks is to represent them as graphs, where nodes and edges denote the molecules and interactions respectively. Network alignment has already been successfully used in many applications including identification of functional annotations [2], and reconstructing biological networks from newly sequenced genome [3], among many others

Objectives

Methods

Results