Abstract

Protein-Protein Interactions (PPIs) play a vital role in most cellular processes. Although many efforts have been devoted to detecting protein interactions by high-throughput experiments, these methods are obviously expensive and tedious. Targeting these inevitable disadvantages, this study develops a novel computational method to predict PPIs using information on protein sequences, which is highly efficient and accurate. The improvement mainly comes from the use of the Rotation Forest (RF) classifier and the Local Phase Quantization (LPQ) descriptor from the Physicochemical Property Response (PR) Matrix of protein amino acids. When performed on three PPI datasets including Saccharomyces cerevisiae, Homo sapiens, and Helicobacter pylori, we obtained good results of average accuracies of 93.8%, 97.96%, and 89.47%, which are much better than in previous studies. Extensive validations have also been explored to evaluate the performance of the Rotation Forest ensemble classifier with the state-of-the-art Support Vector Machine classifier. These promising results indicate that the proposed method might play a complementary role for future proteomics research.

Highlights

  • As a necessary component of all organisms, proteins are involved in most processes of living cells

  • The valuable protein-protein interactions (PPIs) data on diverse species have been accumulated by high-throughput experimental technologies, such as protein chip [3,4], yeast two-hybrid (Y2H) [5,6,7] systems, tandem affinity purification (TAP) [8], mass spectrometry protein complex identification (MS-PCI) [9] and correlated mRNA expression profiling [10]

  • To validate the proposed model, we apply it to three prevalent PPIs datasets, including the Helicobacter pylori dataset, Homo sapiens dataset, and Saccharomyces cerevisiae dataset

Read more

Summary

Introduction

As a necessary component of all organisms, proteins are involved in most processes of living cells. Many computational methods have been developed for predicting PPIs. Many computational methods have been developed for predicting PPIs They are based on different data sources, such as gene fusion, sequence conservation among interacting proteins, gene neighborhood, literature mining knowledge, phylogenetic profiles, and combining interaction information from various data sources [18]. Researchers have recently proposed some computational methods for predicting PPIs based on the knowledge of protein amino acids sequences without the inclusion of any additional information. These computational validations indicate the feasibility of predicting PPIs using protein amino acid sequences alone [19,20,21] Among these previous works, Zhou et al [22] proposed a computational method based on the support vector machine (SVM) and uses auto-correlation descriptors and correlation coefficients.

Evaluation Measures
Prediction Performance of Proposed Model
Generation of the Data Sets
Representation for Protein
Feature Vector Extraction
Rotation Forest
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call