Abstract

Protein interaction networks (PINs) are often used to “learn” new biological function from their topology. Since current PINs are noisy, their computational de-noising via link prediction (LP) could improve the learning accuracy. LP uses the existing PIN topology to predict missing and spurious links. Many of existing LP methods rely on shared immediate neighborhoods of the nodes to be linked. As such, they have limitations. Thus, in order to comprehensively study what are the topological properties of nodes in PINs that dictate whether the nodes should be linked, we introduce novel sensitive LP measures that are expected to overcome the limitations of the existing methods.We systematically evaluate the new and existing LP measures by introducing “synthetic” noise into PINs and measuring how accurate the measures are in reconstructing the original PINs. Also, we use the LP measures to de-noise the original PINs, and we measure biological correctness of the de-noised PINs with respect to functional enrichment of the predicted interactions. Our main findings are: 1) LP measures that favor nodes which are both “topologically similar” and have large shared extended neighborhoods are superior; 2) using more network topology often though not always improves LP accuracy; and 3) LP improves biological correctness of the PINs, plus we validate a significant portion of the predicted interactions in independent, external PIN data sources.Ultimately, we are less focused on identifying a superior method but more on showing that LP improves biological correctness of PINs, which is its ultimate goal in computational biology. But we note that our new methods outperform each of the existing ones with respect to at least one evaluation criterion. Alarmingly, we find that the different criteria often disagree in identifying the best method(s), which has important implications for LP communities in any domain, including social networks.

Highlights

  • IntroductionMotivation and background Networks (or graphs) model real-world phenomena in many domains

  • Motivation and background Networks model real-world phenomena in many domains

  • We comprehensively study what is it in the protein-protein interaction (PPI) network topology around nodes in question that dictates whether the nodes should be linked

Read more

Summary

Introduction

Motivation and background Networks (or graphs) model real-world phenomena in many domains. In PPI networks, nodes are proteins and two nodes are connected by an edge if the corresponding proteins interact in the cell. We focus on these networks, since it is the proteins (gene products) that carry out the majority of cellular processes and they do so by interacting with other proteins. This is exactly what PPI networks model

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call