Abstract
BackgroundProtein complexes carry out nearly all signaling and functional processes within cells. The study of protein complexes is an effective strategy to analyze cellular functions and biological processes. With the increasing availability of proteomics data, various computational methods have recently been developed to predict protein complexes. However, different computational methods are based on their own assumptions and designed to work on different data sources, and various biological screening methods have their unique experiment conditions, and are often different in scale and noise level. Therefore, a single computational method on a specific data source is generally not able to generate comprehensive and reliable prediction results.ResultsIn this paper, we develop a novel Two-layer INtegrative Complex Detection (TINCD) model to detect protein complexes, leveraging the information from both clustering results and raw data sources. In particular, we first integrate various clustering results to construct consensus matrices for proteins to measure their overall co-complex propensity. Second, we combine these consensus matrices with the co-complex score matrix derived from Tandem Affinity Purification/Mass Spectrometry (TAP) data and obtain an integrated co-complex similarity network via an unsupervised metric fusion method. Finally, a novel graph regularized doubly stochastic matrix decomposition model is proposed to detect overlapping protein complexes from the integrated similarity network.ConclusionsExtensive experimental results demonstrate that TINCD performs much better than 21 state-of-the-art complex detection techniques, including ensemble clustering and data integration techniques.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0939-3) contains supplementary material, which is available to authorized users.
Highlights
Protein complexes carry out most signaling and functional processes within cells
To address the above challenges, we propose a novel Two-layer INtegrative Complex Detection (TINCD) model to predict protein complexes as shown in Fig. 1, which leverages the information from both clustering results and raw data sources
Experiment data and evaluation metrics In this study, two types of data (PPI data and Tandem Affinity Purification/Mass Spectrometry (TAP) data) for yeast have been employed for evaluating the performance of various complex detection methods
Summary
Protein complexes carry out most signaling and functional processes within cells. With the increasing availability of proteomics data, various computational methods have recently been developed to predict protein complexes. Almost all of the functional processes within a cell are carried out by protein complexes which are formed by interacting proteins [3]. Detecting protein complexes from protein-protein interaction (PPI) data is crucial for elucidating the modular structure within cells [4, 5]. Computational methods for protein complex detection utilize two types of data, namely, the binary protein interaction data detected by HTS techniques such as Y2H method, and the data for co-complex interactions among proteins [22, 23] from TAP experiments.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.