Abstract

Self-interacting proteins (SIPs) play an influential role in regulating cell structure and function. Thus, it is critically important to identify whether proteins themselves interact with each other. Although there are some existing experimental methods for self-interaction recognition, the limitations of these methods are both expensive and time-consuming. Therefore, it is very necessary to develop an efficient and stable computational method for predicting SIPs. In this study, we develop an effective computational method for predicting SIPs based on rotation forest (RF) classifier, combined with histogram of oriented gradients (HOG) and synthetic minority oversampling technique (SMOTE). When performing SIPs prediction on yeast and human datasets, the proposed method achieves superior accuracies of 97.28% and 89.41%, respectively. In addition, the proposed approach was compared with the state-of-the-art support vector machine (SVM) classifiers and other different methods on the same datasets. The experimental results demonstrate that our method has good robustness and effectiveness and can be regarded as a useful tool for SIPs prediction.

Highlights

  • Protein is the material basis of life and an important part of all cells and tissues. e study of protein-protein interactions has become a fundamental problem. ese interactions can elucidate the mechanism of biological reactions and play a crucial role in living organisms

  • Where true positive TP represents the number of true Self-interacting proteins (SIPs) correctly predicted by the model, false positive FP refers to the number of non-self-interacting proteins predicted to be self-interacting by the model, true negative TN replaces the number of true non-self-interacting proteins correctly predicted by the model, and false negative FN was used to represent the number of self-interacting proteins predicted to be non-self-interacting by the model

  • The proposed method is based on an rotation forest (RF) classifier combined with a position specific scoring matrix (PSSM), Histogram of Oriented Gradients (HOG), and synthetic minority oversampling technique (SMOTE) methods

Read more

Summary

Introduction

Protein is the material basis of life and an important part of all cells and tissues. e study of protein-protein interactions has become a fundamental problem. ese interactions can elucidate the mechanism of biological reactions and play a crucial role in living organisms. Xu et al [11] proposed a prediction method based on protein sequences, in which the method used graph energy to extract the effective information between proteins, and sent the information to the weighted sparse representation-based classification (WSRC) classifier for predicting PPIs. Wang et al [12] proposed a method which called novel stochastic block model. Is method can capture the latent structural features of proteins from the perspective of forming protein complexes by simulating the generative process of the protein interaction network These methods are mainly used to study the interaction between protein pairs with related information such as co-localization, co-expression, and co-evolution, and they cannot be used to predict SIPs. Among them, the most important reason is that the datasets used by these methods do not contain protein information between the same partners.

Materials and Methodology
Results and Discussion
Conclusions
Disclosure
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call