Abstract

BackgroundThe recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. However, most sites on protein sequences cannot be defined as interface or non-interface sites because only a small part of protein interactions had been identified, which will cause the lack of prediction accuracy and generalization ability of predictors in protein interaction sites prediction. Therefore, it is necessary to effectively improve prediction performance of protein interaction sites using large amounts of unlabeled data together with small amounts of labeled data and background knowledge today.ResultsIn this work, three semi-supervised support vector machine–based methods are proposed to improve the performance in the protein interaction sites prediction, in which the information of unlabeled protein sites can be involved. Herein, five features related with the evolutionary conservation of amino acids are extracted from HSSP database and Consurf Sever, i.e., residue spatial sequence spectrum, residue sequence information entropy and relative entropy, residue sequence conserved weight and residual Base evolution rate, to represent the residues within the protein sequence. Then three predictors are built for identifying the interface residues from protein surface using three types of semi-supervised support vector machine algorithms.ConclusionThe experimental results demonstrated that the semi-supervised approaches can effectively improve prediction performance of protein interaction sites when unlabeled information is involved into the predictors and one of them can achieve the best prediction performance, i.e., the accuracy of 70.7%, the sensitivity of 62.67% and the specificity of 78.72%, respectively. With comparison to the existing studies, the semi-supervised models show the improvement of the predication performance.

Highlights

  • The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs

  • The reason is that multi-core learning can utilizes the feature mapping capabilities of each basic kernel, and the data is better expressed in the combined feature space constructed by multiple feature spaces, which can significantly improve the classification accuracy

  • Means3vm-iter transforms the optimization problem into a quadratic programming which and is, and quickly solved by standard programs, it may fall into local minimum, and the classification accuracy is slightly lower than the Means3vm-mkl based-model

Read more

Summary

Introduction

The recognition of protein interaction sites is of great significance in many biological processes, signaling pathways and drug designs. Wang et al in another work implemented a dataset reconstruction strategy by using manifold learning under a hypothesis that the interaction and non-interaction sites have different inherent structure manifolds [13, 22]. These methods have driven advances in PPI research, there is still a problem that a lot of interactions cannot be tagged from experiments, and only a small part of labeled samples can be used for model training in the prediction of PPI sites, which will make it difficult for the well-trained learning systems to have strong generalization ability [23]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call