Abstract

BackgroundIn recent years, many protein complex mining algorithms, such as classical clique percolation (CPM) method and markov clustering (MCL) algorithm, have developed for protein-protein interaction network. However, most of the available algorithms primarily concentrate on mining dense protein subgraphs as protein complexes, failing to take into account the inherent organizational structure within protein complexes. Thus, there is a critical need to study the possibility of mining protein complexes using the topological information hidden in edges. Moreover, the recent massive experimental analyses reveal that protein complexes have their own intrinsic organization.MethodsInspired by the formation process of cliques of the complex social network and the centrality-lethality rule, we propose a new protein complex mining algorithm called Multistage Kernel Extension (MKE) algorithm, integrating the idea of critical proteins recognition in the Protein- Protein Interaction (PPI) network,. MKE first recognizes the nodes with high degree as the first level kernel of protein complex, and then adds the weighted best neighbour node of the first level kernel into the current kernel to form the second level kernel of the protein complex. This process is repeated, extending the current kernel to form protein complex. In the end, overlapped protein complexes are merged to form the final protein complex set.ResultsHere MKE has better accuracy compared with the classical clique percolation method and markov clustering algorithm. MKE also performs better than the classical clique percolation method both on Gene Ontology semantic similarity and co-localization enrichment and can effectively identify protein complexes with biological significance in the PPI network.

Highlights

  • Mining protein complexes is very important in biological processes since it helps reveal the structure-functionality relationships in biological networks

  • If the extent of the closeness is greater than the average extent of closeness of the subgraph formed by the current kernel and its neighbouring nodes, this neighbouring node can be added into the current kernel and be extended into the level kernel

  • Where Ci is the number of nodes in cluster i, and Cj is the number of nodes in cluster j. [see Additional file 1 for Algorithm 3–Multistage Kernel Extension (MKE) Algorithm and for its Time Complexity Analysis]

Read more

Summary

Methods

Constructing directed and weighted network graph In the protein-protein interaction network, for each pair of protein nodes, it is difficult to determine whether they belong to the same protein complex just by the degree of the nodes and their connection characteristics. For two protein nodes, node s and node t, when the directed weights between them are both greater than the given threshold value of the weight, it indicates that they closely connect with each other They can be thought to belong to the same first level kernel of protein complex. Where, |V| is the number of the nodes of the network formed by the current kernel and its neighbours, wstis the directed weight between node s and node t. Using the definition of the weighted best neighbour node for the PPI network, we can find that algorithm in this paper predicts the protein complex by one node in the kernel extending to the nodes outside the kernel, which is different from most of other available algorithms that predict the modules by multiple nodes within the kernel extending to the nodes outside the kernel. Where Ci is the number of nodes in cluster i, and Cj is the number of nodes in cluster j. [see Additional file 1 for Algorithm 3–Multistage Kernel Extension (MKE) Algorithm and for its Time Complexity Analysis]

Results
Introduction
Results and discussion
Dataset Method clusters matched Sn PPV Acc
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call