Abstract
Detecting protein complexes from available protein-protein interaction (PPI) data will help to deeply understand the mechanism of the biological activities. In recent years, various computational methods have been developed for identifying protein complexes from PPI networks. Almost all the basic computational methods mainly depend on the association of topological analysis of PPI networks. However, most of them fail to satisfactorily capture the global and local topological structures of the PPI networks, as well as the diversity of connectivity patterns between individual nodes at the same time. To solve this problem, in this work we propose a node embedding based alias sampling extension method to detect protein complexes. More specifically, for a given set of seed nodes, it first uses the alias sampling strategy based on protein node embedding similarities to select potential addable nodes. Then it makes use of a new conductance measure, which could better quantify the likelihood of a subgraph being a protein complex, to decide whether to extend the current candidate subgraph in order to find protein complexes. Evaluated on six real yeast PPI networks, our method outperforms state-of-the-art methods in detecting protein complexes. Furthermore, the experimental results demonstrate the protein complexes predicted by our method have higher biological significance.
Highlights
A Protein complex is a group of proteins that physically interact with one another to organize various biological processes in the cell
RESULTS we introduce the evaluation metrics and compare our method against the four well-known complex detection approaches on six yeast proteinprotein interaction (PPI) networks
We found that our method outperforms other six state-of-the-art algorithms in identifying protein complexes
Summary
A Protein complex is a group of proteins that physically interact with one another to organize various biological processes in the cell. The main line of the approaches for identifying protein complexes from PPI network is based on the observation of the inherent topological structures of protein complexes [4], [5]. Identifying protein complexes can be formulated as searching for subgraphs that are densely connected inside and well separated from the rest of the networks. Considering this basic idea, the detection methods for protein complexes based on machine learning and data mining have grown rapidly and become useful ways to identify protein complexes. Multiple researches have proved combining extra wellselected biological information would improve the performance of protein complex detection [6], [7]. We only discuss the methods that only use topological characteristics of the network, since the biological information could be added to most of the methods to improve the performance
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.