Abstract

Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.

Highlights

  • Proteins play critical roles in most biological events by interacting with other proteins, compounds, RNA and DNA

  • We developed a novel method to predict proteinprotein interaction sites based on a Random Forest (RF) algorithm with a Minimum Redundancy Maximal Relevance method followed by incremental feature selection (IFS)

  • The Minimum Redundancy Maximal Relevance (mRMR) Result Listed in Information S1 are two outcomes obtained by running the mRMR software: one is a MaxRel feature table that ranks the 714 features according to their relevance to the class of samples; the other is called the mRMR feature table that lists the ranked 714 features according to mRMR criteria

Read more

Summary

Introduction

Proteins play critical roles in most biological events by interacting with other proteins, compounds, RNA and DNA. Understanding the characteristics of interaction sites is basic to understanding the molecular recognition process. Proteins rarely act in isolation and often exert their functions by being part of a large molecular network, with roles coordinated via complicated regulatory networks of protein-protein interactions (PPI). PPI are crucial to most aspects of cellular functions, including regulation of signaling and metabolic pathways, protein synthesis, DNA replication and gene translation, as well as immunological recognition [1]. Identifying the binding sites between two interacting proteins would provide valuable clues for understanding and determining the functions and structures of protein complexes, for facilitating the identification of pharmacological targets and for drug design.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call