Abstract

Understanding the physical arrangement of subunits within protein complexes potentially provides valuable clues about how the subunits work together and how the complexes function. The majority of recent research focuses on identifying protein complexes as a whole and seldom studies the inner structures within complexes. In this study, we propose a computational framework to predict direct contacts and substructures within protein complexes. In this framework, we first train a supervised learning model of l2-regularized logistic regression to learn the patterns of direct and indirect interactions within complexes, from where physical subunit interaction networks are predicted. Then, to infer substructures within complexes, we apply a graph clustering method (i.e., maximum modularity clustering (MMC)) and a gene ontology (GO) semantic similarity based functional clustering on partially- and fully-connected networks, respectively. Computational results show that the proposed framework achieves fairly good performance of cross validation and independent test in terms of detecting direct contacts between subunits. Functional analyses further demonstrate the rationality of partitioning the subunits into substructures via the MMC algorithm and functional clustering.

Highlights

  • Protein complexes have their individual gene products spatiotemporally arranged in place to form the structures required for specific biological activities [1]

  • The negative training data and independent test data were randomly sampled from the indirect interactions within complexes; secondly, each gene pair was represented with a gene ontology (GO) feature vector to train a supervised learning model and the model was estimated via cross validation and independent test

  • As mentioned in the subsection “Negative training and independent test data”, the negative data were sampled from two sources: (1) the co-complexed protein pairs that no path existed between them in human physical protein–protein interaction (PPI) networks (No-path); and (2) the co-complexed protein pairs connected via paths whose path lengths all were no less than two (No-less-than-two)

Read more

Summary

Introduction

Protein complexes have their individual gene products spatiotemporally arranged in place to form the structures required for specific biological activities [1]. Investigating the disorder of subunits within protein complexes is crucial to elucidate the underlying mechanisms of various diseases [2]. The majority of research, including experimental and computational methods, focuses on identifying protein complexes as a whole. The experimental techniques, e.g., tandem affinity purification with mass spectrometry (TAP-MS) and co-fractionation mass spectrometry (CF-MS), have been frequently used to detect protein complexes. Many computational methods have been proposed to rapidly provide global landscape of genome-scale protein complexes. The well-known databases of protein complexes include MIPS [3], CORUM [4], HPRD [5] and Reactome [6,7]. MIPS [3] collects the protein complexes of Saccharomyces cerevisiae

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call