Building multiclass classifiers for remote homology detection and fold recognition.

Huzefa Rangwala,George Karypis

doi:10.1186/1471-2105-7-455

Abstract

BackgroundProtein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems.ResultsWe present a comprehensive evaluation of a number of methods for building SVM-based multiclass classification schemes in the context of the SCOP protein classification. These methods include schemes that directly build an SVM-based multiclass model, schemes that employ a second-level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build and combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes.ConclusionAnalyzing the performance achieved by the different approaches on four different datasets we show that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes that use predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to not only lead to lower error rates but also reduce the number of errors in which a superfamily is assigned to an entirely different fold and a fold is predicted as being from a different SCOP class. Our results also show that the limited size of the training data makes it hard to learn complex second-level models, and that models of moderate complexity lead to consistently better results.

Highlights

Protein remote homology detection and fold recognition are central problems in computational biology
Recent advances in string kernels that have been designed for protein sequences and capture their evolutionary relationships [14,15] have resulted in the development of support vector machine-based (SVMs) [16] discriminative classifiers that show superior performance when compared to the other methods [15]
We present a comprehensive study of different approaches for building such classifiers including (i) schemes that directly build an SVM-based multiclass model, (ii) schemes that employ a second-level learner to combine the predictions generated by a set of binary SVM-based classifiers, and (iii) schemes that build and combine binary classifiers for various levels of the SCOP hierarchy

Summary

Introduction

Protein remote homology detection and fold recognition are central problems in computational biology. Recent advances in string kernels that have been designed for protein sequences and capture their evolutionary relationships [14,15] have resulted in the development of support vector machine-based (SVMs) [16] discriminative classifiers that show superior performance when compared to the other methods [15]. These SVM-based approaches were designed to solve one-versusrest binary classification problems and to this date, they are primarily evaluated with respect to how well each binary classifier can identify the proteins that belong to its own class (e.g., superfamily or fold). This is essentially a multiclass classification problem, in which given a set of K classes, we would like to assign a protein sequence to one of them

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC bioinformatics	Publication Date: Oct 16, 2006
Citations: 67	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Building multiclass classifiers for remote homology detection and fold recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Remote protein homology detection and fold recognition using two-layer support vector machine classifiers
Hilmi M Muda ... Razib M Othman
Computers in Biology and Medicine | VOL. 41
Hilmi M Muda, et. al.Hilmi M Muda ... Razib M Othman
25 Jun 2011
Computers in Biology and Medicine | VOL. 41

Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection
Inkyung Jung ... Jaehyung Lee
BMC bioinformatics | VOL. 9
Inkyung Jung, et. al.Inkyung Jung ... Jaehyung Lee
01 Jul 2008
BMC bioinformatics | VOL. 9

Latent Semantic Analysis- and Hierarchical Clustering-Based Method for Detecting Remote Protein Homology
Tianjiao Zhang ... Yadong Wang
-
Tianjiao Zhang, et. al.Tianjiao Zhang ... Yadong Wang
13 Jun 2016
13 Jun 2016

A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis
Bin Liu ... Xiaolong Wang
BMC bioinformatics | VOL. 9
Bin Liu, et. al.Bin Liu ... Xiaolong Wang
01 Dec 2008
BMC bioinformatics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building multiclass classifiers for remote homology detection and fold recognition.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics