Abstract

Various biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.

Highlights

  • Various biochemical functions of organisms are performed by protein–protein interactions (PPIs)

  • Wang et al.[28] presented a computational model to predict PPIs, which is based on a Zernike moment (ZM) feature descriptor and a probabilistic classification vector machine (PCVM) algorithm

  • The calculation formulas for overall prediction accuracy (Acc), precision (Pre), sensitivity (Sen), and Matthews correlation coefficient (MCC) are defined as: Accuracy =

Read more

Summary

Introduction

Various biochemical functions of organisms are performed by protein–protein interactions (PPIs). We propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. The data sources involved in these methods mainly include literature mining ­knowledge[13], gene ­fusion[14], phylogenetic p­ rofiles[15], gene ontology ­annotations[16], gene ­neighborhood[17], and coevolution analysis of interacting p­ roteins[18] These methods are not commonly used to predict PPIs as they are difficult to apply if a priori information about the protein is not available. Huang et al.[19] presented a sequence-based substitution matrix representation (SMR) method to predict PPIs by using discrete cosine transform (DCT) This method yielded an average accuracy of 96.28% on the yeast dataset. Our method obtains good prediction accuracy, which can reflect the reliability of the proposed method in predicting PPIs

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call