Abstract

The problem of predicting sets of components of heteromeric protein complexes is a challenging problem in Systems Biology. There have been many tools proposed to predict those complexes. Among them, PPSampler, a protein complex prediction algorithm based on the Metropolis-Hastings algorithm, is reported to outperform other tools. In this work, we improve PPSampler by refining scoring functions and a proposal distribution used inside the algorithm so that predicted clusters are more accurate as well as the resulting algorithm runs faster. The new version is called PPSampler2. In computational experiments, PPSampler2 is shown to outperform other tools including PPSampler. The F-measure score of PPSampler2 is 0.67, which is at least 26% higher than those of the other tools. In addition, about 82% of the predicted clusters that are unmatched with any known complexes are statistically significant on the biological process aspect of Gene Ontology. Furthermore, the running time is reduced to twenty minutes, which is 1/24 of that of PPSampler.

Highlights

  • Protein complexes are essential molecular entities in the cell because intrinsic functions of an individual protein are often performed in the form of a protein complex

  • Most computational approaches to predict the components of protein complexes are designed based on the observation that densely connected subgraphs in a protein-protein interaction (PPI) network are often overlapped with some known protein complexes

  • Evaluation by Gene Ontology It is reasonable to suppose that the list of protein complexes recorded in databases are still incomplete

Read more

Summary

Introduction

Protein complexes are essential molecular entities in the cell because intrinsic functions of an individual protein are often performed in the form of a protein complex. It is helpful to identify all protein complexes of an organism for elucidation of the molecular mechanisms underlying biological processes. Reliable protein complex purification experiments are rather laborious and time-consuming. It has been expected to provide reliable candidates for true protein complexes by computational prediction methods. Most computational approaches to predict the components of protein complexes are designed based on the observation that densely connected subgraphs in a protein-protein interaction (PPI) network are often overlapped with some known protein complexes. One of the differences among those methods is the search strategies to find good clusters of proteins. MCL [1] is considered to be a clustering algorithm, in which clusters are formed by repeatedly executing an inflation step

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call