Abstract

BackgroundSeveral computational methods have been developed to predict protein-protein interactions from amino acid sequences, but most of those methods are intended for the interactions within a species rather than for interactions across different species. Methods for predicting interactions between homogeneous proteins are not appropriate for finding those between heterogeneous proteins since they do not distinguish the interactions between proteins of the same species from those of different species.ResultsWe developed a new method for representing a protein sequence of variable length in a frequency vector of fixed length, which encodes the relative frequency of three consecutive amino acids of a sequence. We built a support vector machine (SVM) model to predict human proteins that interact with virus proteins. In two types of viruses, human papillomaviruses (HPV) and hepatitis C virus (HCV), our SVM model achieved an average accuracy above 80%, which is higher than that of another SVM model with a different representation scheme. Using the SVM model and Gene Ontology (GO) annotations of proteins, we predicted new interactions between virus proteins and human proteins.ConclusionsEncoding the relative frequency of amino acid triplets of a protein sequence is a simple yet powerful representation method for predicting protein-protein interactions across different species. The representation method has several advantages: (1) it enables a prediction model to achieve a better performance than other representations, (2) it generates feature vectors of fixed length regardless of the sequence length, and (3) the same representation is applicable to different types of proteins.

Highlights

  • Several computational methods have been developed to predict protein-protein interactions from amino acid sequences, but most of those methods are intended for the interactions within a species rather than for interactions across different species

  • We propose a representation method and a support vector machine (SVM) model to predict the interactions between two types of viruses and human proteins

  • In this paper we presented a support vector machine (SVM) model and its representation method for predicting the interactions between viruses and human proteins

Read more

Summary

Introduction

Several computational methods have been developed to predict protein-protein interactions from amino acid sequences, but most of those methods are intended for the interactions within a species rather than for interactions across different species. Methods for predicting interactions between homogeneous proteins are not appropriate for finding those between heterogeneous proteins since they do not distinguish the interactions between proteins of the same species from those of different species. Bock and Gough [1] created protein structural and physiochemical descriptors based on the sequence data, and trained a support vector machine classifier to identify PPIs from the descriptors. Methods typically used to predict interactions between homogeneous proteins are not appropriate for predicting those between heterogeneous proteins, since such methods do not distinguish interactions between proteins of the same species from those of different species

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call