Abstract

All protein-protein interaction (PPI) predictors require the determination of an operational decision threshold when differentiating positive PPIs from negatives. Historically, a single global threshold, typically optimized via cross-validation testing, is applied to all protein pairs. However, we here use data visualization techniques to show that no single decision threshold is suitable for all protein pairs, given the inherent diversity of protein interaction profiles. The recent development of high throughput PPI predictors has enabled the comprehensive scoring of all possible protein-protein pairs. This, in turn, has given rise to context, enabling us now to evaluate a PPI within the context of all possible predictions. Leveraging this context, we introduce a novel modeling framework called Reciprocal Perspective (RP), which estimates a localized threshold on a per-protein basis using several rank order metrics. By considering a putative PPI from the perspective of each of the proteins within the pair, RP rescores the predicted PPI and applies a cascaded Random Forest classifier leading to improvements in recall and precision. We here validate RP using two state-of-the-art PPI predictors, the Protein-protein Interaction Prediction Engine and the Scoring PRotein INTeractions methods, over five organisms: Homo sapiens, Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, and Mus musculus. Results demonstrate the application of a post hoc RP rescoring layer significantly improves classification (p < 0.001) in all cases over all organisms and this new rescoring approach can apply to any PPI prediction method.

Highlights

  • A broad range of machine learning algorithms have been applied to various facets of protein-protein interaction (PPI) prediction

  • We introduce a novel concept in the application of interactome-wide analyses of protein-protein interaction networks: Reciprocal Perspective (RP), which jointly considers an interaction from the perspective of each partner to determine a new assessment of the interaction

  • We suggest revising the assumption that a single global threshold can be appropriately defined across the proteome due to the inherent diversity of protein interaction profiles

Read more

Summary

Introduction

A broad range of machine learning algorithms have been applied to various facets of PPI prediction. While the field of PPI prediction is methodologically diverse, irrespective of the paradigm, learning algorithm, and scale of the number of predictions, the field has certain fundamental commonalities All of these methods examine the query protein pair and output a score denoting the predicted likelihood that the pair will physically interact. PPI prediction tasks have been limited to modest subsets of the complete interactome enabling the elucidation of localized sub-networks or the identification of an interspersion of isolated PPIs relative to the complete interactome This limitation is due to the algorithmic time-complexity of most PPI predictors, those examining protein structure. A method predicting one interaction per second would require over 6.3 years to examine all pairs and produce the complete human interactome This has prompted research groups to develop optimised predictors and leverage high performance computing. Tuning this threshold to less conservative levels threatens to introduce a large number of false positives, thereby reducing the utility of the classifier

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call