Abstract

We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.

Highlights

  • Proteins are crucial molecules that participate in many cellular functions in an organism

  • The main improvements are the results of representing protein sequences using the Local Phase Quantization (LPQ) feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise by using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier

  • The feature extraction step contains two steps: (1) each protein from the datasets is represented as a PSSM matrix and (2) the PSSM matrix of each protein is expressed as a 256-dimensional vector using the LPQ method

Read more

Summary

Introduction

Proteins are crucial molecules that participate in many cellular functions in an organism. Proteins do not perform their roles individually, so detection of PPIs becomes more and more important. A large quantity of PPI data for different organisms has been generated, and many databases, such as MINT [5], BIND [6], and DIP [7], have been built to store protein interaction data. These experimental methods have some shortcomings, such as being time-intensive and costly

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call