Abstract

Protein-protein interactions (PPIs) are essential for understanding the function of biological systems and have been characterized using a vast array of experimental techniques. These techniques detect only a small proportion of all PPIs and are labor intensive and time consuming. Therefore, the development of computational methods capable of predicting PPIs accelerates the pace of discovery of new interactions. This paper reports a machine learning-based prediction model, the Universal In Silico Predictor of Protein-Protein Interactions (UNISPPI), which is a decision tree model that can reliably predict PPIs for all species (including proteins from parasite-host associations) using only 20 combinations of amino acids frequencies from interacting and non-interacting proteins as learning features. UNISPPI was able to correctly classify 79.4% and 72.6% of experimentally supported interactions and non-interacting protein pairs, respectively, from an independent test set. Moreover, UNISPPI suggests that the frequencies of the amino acids asparagine, cysteine and isoleucine are important features for distinguishing between interacting and non-interacting protein pairs. We envisage that UNISPPI can be a useful tool for prioritizing interactions for experimental validation.

Highlights

  • Graph or network theory has been used to model complex systems such as social and biological aspects [1,2], and provides a good interface between the reductionist and holistic views [3]

  • We present in this paper the Universal In Silico Predictor of Protein-Protein Interactions (UNISPPI), an machine learning (ML)-based approach that use features associated with amino acid sequences for building models for predicting protein-protein interactions (PPIs)

  • We assumed that the best set of features that were able to discriminate the PPI and no-PPI classes were those that produced decision tree models with the highest median predictive values

Read more

Summary

Introduction

Graph or network theory has been used to model complex systems such as social and biological aspects [1,2], and provides a good interface between the reductionist and holistic views [3]. Proteins are one of the most abundant classes of biomolecules that can interact with many other biomolecules in cells, such as DNA, RNA, metabolites and other proteins The latter interactions – protein-protein interactions (PPIs) – are essential interactions that build functional units responsible for the functioning of all biological molecular pathways [4]. The collection of all PPIs can be important for understanding the underlying mechanisms of diseases, facilitating the process of drug design, elucidating the functions of newly identified proteins, predicting their subcellular location and gaining insight into the evolution of some interaction or metabolic pathways, among other biological aspects of a cell or organism

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call