Abstract

Protein–protein interactions (PPIs) play key roles in most cellular processes, such as cell metabolism, immune response, endocrine function, DNA replication, and transcription regulation. PPI prediction is one of the most challenging problems in functional genomics. Although PPI data have been increasing because of the development of high-throughput technologies and computational methods, many problems are still far from being solved. In this study, a novel predictor was designed by using the Random Forest (RF) algorithm with the ensemble coding (EC) method. To reduce computational time, a feature selection method (DX) was adopted to rank the features and search the optimal feature combination. The DXEC method integrates many features and physicochemical/biochemical properties to predict PPIs. On the Gold Yeast dataset, the DXEC method achieves 67.2% overall precision, 80.74% recall, and 70.67% accuracy. On the Silver Yeast dataset, the DXEC method achieves 76.93% precision, 77.98% recall, and 77.27% accuracy. On the human dataset, the prediction accuracy reaches 80% for the DXEC-RF method. We extended the experiment to a bigger and more realistic dataset that maintains 50% recall on the Yeast All dataset and 80% recall on the Human All dataset. These results show that the DXEC method is suitable for performing PPI prediction. The prediction service of the DXEC-RF classifier is available at http://ailab.ahu.edu.cn:8087/DXECPPI/index.jsp.

Highlights

  • IntroductionProtein–protein interactions (PPIs) [1,2] play crucial roles in virtually every biological function

  • Protein–protein interactions (PPIs) [1,2] play crucial roles in virtually every biological function.Proteins interact with each other to form protein–protein complexes and perform different biological processes, including metabolism, immune response, endocrine function, and DNA replication [3,4].Various experimental and computational methods have been developed to detect PPIs

  • These results show that the DXEC method is suitable for performing PPI prediction

Read more

Summary

Introduction

Protein–protein interactions (PPIs) [1,2] play crucial roles in virtually every biological function. Proteins interact with each other to form protein–protein complexes and perform different biological processes, including metabolism, immune response, endocrine function, and DNA replication [3,4]. Various experimental and computational methods (e.g., two-hybrid systems [5,6], mass spectrometry [7], and protein chip technology [8]) have been developed to detect PPIs. PPIs have generally been studied individually by small-scale biochemical and biophysical experimental techniques. PPIs have generally been studied individually by small-scale biochemical and biophysical experimental techniques These experimental approaches are usually time-consuming and expensive. Sequence-based methods have the advantage of not requiring expensive and time-consuming processes to determine protein structures. These methods need to encode only protein sequence pairs to distinguish interaction and non-interaction. The experiment demonstrates that the ensemble coding (EC) method based on the feature extraction scheme contributes to PPI prediction and is better than other well-known methods using the yeast/human dataset

Performance Evaluation
The DX Result
Feature Importance Evaluation
Comparison of Prediction Performance by Using Different Methods
Comparison with Other Methods on the All Interaction Datasets
Preparation of Datasets
Molecular Descriptors
The Features of Amino Acid Factors
Ensemble Coding Scheme
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.