Abstract

It is significant for biological cells to predict self-interacting proteins (SIPs) in the field of bioinformatics. SIPs mean that two or more identical proteins can interact with each other by one gene expression. This plays a major role in the evolution of protein‒protein interactions (PPIs) and cellular functions. Owing to the limitation of the experimental identification of self-interacting proteins, it is more and more significant to develop a useful biological tool for the prediction of SIPs from protein sequence information. Therefore, we propose a novel prediction model called RP-FFT that merges the Random Projection (RP) model and Fast Fourier Transform (FFT) for detecting SIPs. First, each protein sequence was transformed into a Position Specific Scoring Matrix (PSSM) using the Position Specific Iterated BLAST (PSI-BLAST). Second, the features of protein sequences were extracted by the FFT method on PSSM. Lastly, we evaluated the performance of RP-FFT and compared the RP classifier with the state-of-the-art support vector machine (SVM) classifier and other existing methods on the human and yeast datasets; after the five-fold cross-validation, the RP-FFT model can obtain high average accuracies of 96.28% and 91.87% on the human and yeast datasets, respectively. The experimental results demonstrated that our RP-FFT prediction model is reasonable and robust.

Highlights

  • Protein is an important component of all cells

  • The main idea of our proposed method includes four aspects: (1) the protein sequence information could be described as a Position-Specific Scoring Matrix (PSSM); (2) using the fast Fourier transform (FFT) method to extract eigenvectors from protein sequences on a PSSM; (3) using the Principal Component Analysis (PCA) approach to convert the high-dimensional data into useful information after Fast Fourier Transform (FFT) and the noise is removed, so the pattern in the data is found; (4) the random projection (RP) algorithm is employed to build a training set where the classifier will be trained

  • To estimate the stability and availability of our prediction model, we used five measurements that were commonly used in binary classification tasks, including accuracy (Acc.), sensitivity (Sen.), specificity (Spe.), Matthews correlation coefficient (MCC) [26,27,28,29,30,31,32], and Balanced Accuracy (B_Acc.) [33], respectively

Read more

Summary

Introduction

Protein is an important component of all cells. It is an organic macromolecule and the basic material of life. The main idea of our proposed method includes four aspects: (1) the protein sequence information could be described as a Position-Specific Scoring Matrix (PSSM); (2) using the fast Fourier transform (FFT) method to extract eigenvectors from protein sequences on a PSSM; (3) using the Principal Component Analysis (PCA) approach to convert the high-dimensional data into useful information after FFT and the noise is removed, so the pattern in the data is found; (4) the RP algorithm is employed to build a training set where the classifier will be trained Take it in detail as follows: first, the PSSM from each protein sequence is likely to result in a eigenvector whose dimension is 400 by applying the FFT method for extracting important information; reduce the dimension of the FFT vector to 300 for improving the performance of prediction by employing the PCA dimensionality reduction method; eventually, perform classification on yeast and human datasets by applying the RP classifier. This indicates that the proposed model is suitable and performs well for predicting SIPs

Performance Evaluation
Performance of the Proposed Method
Comparison with Other Feature Extraction Methods
Comparison with the SVM-Based Method
Datasets
Position-Specific Scoring Matrix
Fast Fourier Transform
Support Vector Machine
Random Projection Classifier
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call