Abstract

BackgroundPredicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance.ResultsIn this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods.ConclusionThe improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.

Highlights

  • Predicting the binding sites between two interacting proteins provides important clues to the function of a protein

  • Comparison with related methods Through the experiments reported here, the performance of the three following methods is compared with our method: artificial neural network (ANN), support vector machine (SVM) and conditional random field (CRF)

  • On the six data sets, HMSVM yields the best performance according to Matthews correlation coefficient (MCC), F1 and AUC, which indicates that Hidden Markov support vector machine (HM-SVM) can obtain better trade-off between specificity+ and sensitivity+ automatically

Read more

Summary

Introduction

Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Development of fast and accurate computational methods for protein binding site prediction is helpful in identifying functionally important amino acid residues and facilitating exper-. BMC Bioinformatics 2009, 10:381 http://www.biomedcentral.com/1471-2105/10/381 imental efforts to catalogue protein interactions. It plays a key role in enhancing computational docking studies, drug design and functional annotation for the structurally determined proteins with unknown function [1]. Several machine learning methods have been applied in this field. These methods can be split into two groups: classification methods and sequential labelling methods.

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.