Abstract
High-throughput technology has generated large-scale protein interaction data, which is crucial in our understanding of biological organisms. Many complex identification algorithms have been developed to determine protein complexes. However, these methods are only suitable for dense protein interaction networks, because their capabilities decrease rapidly when applied to sparse protein–protein interaction (PPI) networks. In this study, based on penalized matrix decomposition (PMD), a novel method of penalized matrix decomposition for the identification of protein complexes (i.e., PMDpc) was developed to detect protein complexes in the human protein interaction network. This method mainly consists of three steps. First, the adjacent matrix of the protein interaction network is normalized. Second, the normalized matrix is decomposed into three factor matrices. The PMDpc method can detect protein complexes in sparse PPI networks by imposing appropriate constraints on factor matrices. Finally, the results of our method are compared with those of other methods in human PPI network. Experimental results show that our method can not only outperform classical algorithms, such as CFinder, ClusterONE, RRW, HC-PIN, and PCE-FR, but can also achieve an ideal overall performance in terms of a composite score consisting of F-measure, accuracy (ACC), and the maximum matching ratio (MMR).
Highlights
The identification of protein complexes is highly beneficial for the investigation of all kinds of organisms to understand biological processes and determine inherent organizational structures within cells [1]
The local search approaches based on density are used to identify densely connected subgraphs in protein–protein interaction (PPI) networks, in which subgraphs with density above a pre-defined threshold, such as MCODE (Molecular Complex Detection) [2], CFinder
Clique Percolation Method) [5], are considered protein complexes. These approaches tend to neglect surrounding proteins that are connected to the kernel clusters with sparse links, which can show experimentally validated true interactions [6]. Another kind of method for detecting protein complexes uses classical hierarchy clustering techniques, which mainly depend on the distance between proteins to detect meaningful groups [6] and contain HC-PIN ((fast Hierarchical Clustering algorithm for Protein Interaction Network, agglomerative method) [7] and G-N algorithms [8]
Summary
The identification of protein complexes is highly beneficial for the investigation of all kinds of organisms to understand biological processes and determine inherent organizational structures within cells [1]. Clique Percolation Method) [5], are considered protein complexes These approaches tend to neglect surrounding proteins that are connected to the kernel clusters with sparse links, which can show experimentally validated true interactions [6]. Many hierarchical clustering methods employ similarities among the proteins that are calculated on the basis of network topology characteristics or biological meaning due to the further development of clustering technology.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have