PIPE4: Fast PPI Predictor for Comprehensive Inter- and Cross-Species Interactomes

Kevin Dick,Stephen J Molnar,Kyle K Biggar,Le Hoa Tan,Benjamin Mimee,Frank Dehne,James R Green,Elroy R Cober,Ashkan Golshani,Bradley Barnes,Bahram Samanfar

doi:10.1038/s41598-019-56895-w

Kevin Dick, Stephen J Molnar + Show 9 more

Open Access

https://doi.org/10.1038/s41598-019-56895-w

Copy DOI

Abstract

The need for larger-scale and increasingly complex protein-protein interaction (PPI) prediction tasks demands that state-of-the-art predictors be highly efficient and adapted to inter- and cross-species predictions. Furthermore, the ability to generate comprehensive interactomes has enabled the appraisal of each PPI in the context of all predictions leading to further improvements in classification performance in the face of extreme class imbalance using the Reciprocal Perspective (RP) framework. We here describe the PIPE4 algorithm. Adaptation of the PIPE3/MP-PIPE sequence preprocessing step led to upwards of 50x speedup and the new Similarity Weighted Score appropriately normalizes for window frequency when applied to any inter- and cross-species prediction schemas. Comprehensive interactomes for three prediction schemas are generated: (1) cross-species predictions, where Arabidopsis thaliana is used as a proxy to predict the comprehensive Glycine max interactome, (2) inter-species predictions between Homo sapiens-HIV1, and (3) a combined schema involving both cross- and inter-species predictions, where both Arabidopsis thaliana and Caenorhabditis elegans are used as proxy species to predict the interactome between Glycine max (the soybean legume) and Heterodera glycines (the soybean cyst nematode). Comparing PIPE4 with the state-of-the-art resulted in improved performance, indicative that it should be the method of choice for complex PPI prediction schemas.

Highlights

Comparing PIPE4 with the state-of-the-art resulted in improved performance, indicative that it should be the method of choice for complex protein-protein interaction (PPI) prediction schemas
While protein Interaction Prediction Engine (PIPE) has demonstrated preliminary successes when applied to inter-species prediction tasks[19], this problem requires that the scoring function, which might account for the frequency of short, contiguous subsequences, is appropriate since proteome sizes can vary considerably between organisms, and we expect the number of similar subsequences to vary greatly as a result
In addition to its computational efficiency and protein-protein interaction (PPI) prediction accuracy being competitive with the state-of-the-art, PIPE can propose the putative site of interaction using the PIPE-Sites algorithm, previously described

Summary

Introduction

Comparing PIPE4 with the state-of-the-art resulted in improved performance, indicative that it should be the method of choice for complex PPI prediction schemas. While PIPE has demonstrated preliminary successes when applied to inter-species prediction tasks[19], this problem requires that the scoring function, which might account for the frequency of short, contiguous subsequences, is appropriate since proteome sizes can vary considerably between organisms, and we expect the number of similar subsequences to vary greatly as a result.

Results

Conclusion