Abstract

BackgroundPredicting a protein's structural or functional class from its amino acid sequence or structure is a fundamental problem in computational biology. Recently, there has been considerable interest in using discriminative learning algorithms, in particular support vector machines (SVMs), for classification of proteins. However, because sufficiently many positive examples are required to train such classifiers, all SVM-based methods are hampered by limited coverage.ResultsIn this study, we develop a hybrid machine learning approach for classifying proteins, and we apply the method to the problem of assigning proteins to structural categories based on their sequences or their 3D structures. The method combines a full-coverage but lower accuracy nearest neighbor method with higher accuracy but reduced coverage multiclass SVMs to produce a full coverage classifier with overall improved accuracy. The hybrid approach is based on the simple idea of "punting" from one method to another using a learned threshold.ConclusionIn cross-validated experiments on the SCOP hierarchy, the hybrid methods consistently outperform the individual component methods at all levels of coverage.Code and data sets are available at

Highlights

  • Predicting a protein's structural or functional class from its amino acid sequence or structure is a fundamental problem in computational biology

  • In 1999, Jaakkola et al [8] first applied the support vector machine (SVM) classifier [9] to the problem of predicting a protein's structural class from its amino acid sequence. They focused on a particular protein structural hierarchy called the Structural Classification of Proteins (SCOP) [10], and they trained support vector machines (SVMs) to recognize novel families

  • We aim to address a fundamental limitation of any SVM-based method, namely, that an SVM can only be trained when a sufficient number of training examples are available

Read more

Summary

Introduction

Predicting a protein's structural or functional class from its amino acid sequence or structure is a fundamental problem in computational biology. In 1999, Jaakkola et al [8] first applied the support vector machine (SVM) classifier [9] to the problem of predicting a protein's structural class from its amino acid sequence. They focused on a particular protein structural hierarchy called the Structural Classification of Proteins (SCOP) [10], and they trained SVMs to recognize novel families (page number not for citation purposes)

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.