Abstract

BackgroundAlthough the prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. Furthermore, it has recently been estimated that the overall average false positive rate of available computational and high-throughput experimental interaction datasets is as high as 90%.ResultsThe prediction of human protein-protein interactions was investigated by combining orthogonal protein features within a probabilistic framework. The features include co-expression, orthology to known interacting proteins and the full-Bayesian combination of subcellular localization, co-occurrence of domains and post-translational modifications. A novel scoring function for local network topology was also investigated. This topology feature greatly enhanced the predictions and together with the full-Bayes combined features, made the largest contribution to the predictions. Using a conservative threshold, our most accurate predictor identifies 37606 human interactions, 32892 (80%) of which are not present in other publicly available large human interaction datasets, thus substantially increasing the coverage of the human interaction map. A subset of the 32892 novel predicted interactions have been independently validated. Comparison of the prediction dataset to other available human interaction datasets estimates the false positive rate of the new method to be below 80% which is competitive with other methods. Since the new method scores and ranks all human protein pairs, smaller subsets of higher quality can be generated thus leading to even lower false positive prediction rates.ConclusionThe set of interactions predicted in this work increases the coverage of the human interaction map and will help determine the highest confidence human interactions.

Highlights

  • The prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human

  • The Transitive module considers the local topology of the network predicted by the group A modules and requires the completion of their analysis to calculate its own likelihood ratios of interaction (Figure 1B)

  • In the absence of the Transitive module, the Preliminary Score is used as the final likelihood ratio output by the predictor

Read more

Summary

Introduction

The prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. It has recently been estimated that the overall average false positive rate of available computational and high-throughput experimental interaction datasets is as high as 90%. Protein-protein interactions perform and regulate fundamental cellular processes. The comprehensive study of such interactions on a genome-wide scale will lead to a clearer understanding of diverse cellular processes and of the molecular mechanisms of disease. The determination of interactions by small-scale laboratory techniques is impractical for a complete proteome on the grounds of cost and time, several experimental techniques exist to determine protein-protein interactions in a high-throughput manner [1]. Interactions determined by high-throughput (page number not for citation purposes)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call