Abstract

Self-interacting proteins (SIPs), whose more than two identities can interact with each other, play significant roles in the understanding of cellular process and cell functions. Although a number of experimental methods have been designed to detect the SIPs, they remain to be extremely time-consuming, expensive, and challenging even nowadays. Therefore, there is an urgent need to develop the computational methods for predicting SIPs. In this study, we propose a deep forest based predictor for accurate prediction of SIPs using protein sequence information. More specifically, a novel feature representation method, which integrate position-specific scoring matrix (PSSM) with wavelet transform, is introduced. To evaluate the performance of the proposed method, cross-validation tests are performed on two widely used benchmark datasets. The experimental results show that the proposed model achieved high accuracies of 95.43 and 93.65% on human and yeast datasets, respectively. The AUC value for evaluating the performance of the proposed method was also reported. The AUC value for yeast and human datasets are 0.9203 and 0.9586, respectively. To further show the advantage of the proposed method, it is compared with several existing methods. The results demonstrate that the proposed model is better than other SIPs prediction methods. This work can offer an effective architecture to biologists in detecting new SIPs.

Highlights

  • Proteins, highly complex substance, are the main compound of all the life

  • There are a great deal of computational techniques based on machine learning and deep learning (Gui et al, 2009; You et al, 2010b, 2015a, 2017a,b; Lu et al, 2013; Mi et al, 2013; Huang et al, 2015; Chen et al, 2016, 2018a,b,c; Gui et al, 2016; Huang et al, 2016b; Li et al, 2018) which applied in the field of bioinformatics and genomics, in which they were few for detecting protein interactions

  • In this study we presented a novel approach for self-interacting proteins (SIPs) prediction, which combined deep forest with wavelet transform (WT) method based on position-specific scoring matrix (PSSM) of protein sequences

Read more

Summary

Introduction

Highly complex substance, are the main compound of all the life. It is the material basis and the first element of the life. Most of proteins can work together with molecular partners or other proteins, which are associated with proteinprotein interactions (PPIs) (Chou and Cai, 2006; You et al, 2014b,c; Li et al, 2017). Deep Forest for Predicting SIPs key roles in the understanding of celluar process and cell functions. These interactions have received much more attention than they have done in recent years. Most previous works focus on the individual SIPs with the level of structures and functions. There are a great deal of computational techniques based on machine learning and deep learning (Gui et al, 2009; You et al, 2010b, 2015a, 2017a,b; Lu et al, 2013; Mi et al, 2013; Huang et al, 2015; Chen et al, 2016, 2018a,b,c; Gui et al, 2016; Huang et al, 2016b; Li et al, 2018) which applied in the field of bioinformatics and genomics, in which they were few for detecting protein interactions

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call