Protein complexes identification based on go attributed network embedding

Bo Xu,Xiaoxia Liu,Zhehuan Zhao,Yijia Zhang,Zengyou He,Wei Zheng,Kun Li

doi:10.1186/s12859-018-2555-x

Abstract

BackgroundIdentifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challenge to integrate different types of biological information into the complexes discovery process under a unified framework. Recently, attributed network embedding methods have be proved to be remarkably effective in generating vector representations for nodes in the network. In the transformed vector space, both the topological proximity and node attributed affinity between different nodes are preserved. Therefore, such attributed network embedding methods provide us a unified framework to integrate various biological evidences into the protein complexes identification process.ResultsIn this article, we propose a new method called GANE to predict protein complexes based on Gene Ontology (GO) attributed network embedding. Firstly, it learns the vector representation for each protein from a GO attributed PPI network. Based on the pair-wise vector representation similarity, a weighted adjacency matrix is constructed. Secondly, it uses the clique mining method to generate candidate cores. Consequently, seed cores are obtained by ranking candidate cores based on their densities on the weighted adjacency matrix and removing redundant cores. For each seed core, its attachments are the proteins with correlation score that is larger than a given threshold. The combination of a seed core and its attachment proteins is reported as a predicted protein complex by the GANE algorithm. For performance evaluation, we compared GANE with six protein complex identification methods on five yeast PPI networks. Experimental results showes that GANE performs better than the competing algorithms in terms of different evaluation metrics.ConclusionsGANE provides a framework that integrate many valuable and different biological information into the task of protein complex identification. The protein vector representation learned from our attributed PPI network can also be used in other tasks, such as PPI prediction and disease gene prediction.

Highlights

Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics
We propose a new method called GANE to predict protein complexes based on Gene Ontology(GO) attributed network embedding
The protein vector representation learned from our attributed PPI network can be used in other tasks, such as PPI prediction and disease gene prediction

Summary

Introduction

Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. In the transformed vector space, both the topological proximity and node attributed affinity between different nodes are preserved Such attributed network embedding methods provide us a unified framework to integrate various biological evidences into the protein complexes identification process. One important task in proteomics is to detect protein complexes from protein-protein interaction (PPI) networks. A PPI network is usually modeled as an undirected graph, where the nodes in the graph represent proteins and the edges represent the interactions between proteins. Most of these protein complexes identification methods are based on the principle that densely linked. The issue of predicting protein complexes can be formulated as the problem of detecting densely linked regions in PPI networks

Methods

Results

Conclusion