Abstract

BackgroundDetecting remote homologies by direct comparison of protein sequences remains a challenging task. We had previously developed a similarity score between sequences, called a local alignment kernel, that exhibits good performance for this task in combination with a support vector machine. The local alignment kernel depends on an amino acid substitution matrix. Since commonly used BLOSUM or PAM matrices for scoring amino acid matches have been optimized to be used in combination with the Smith-Waterman algorithm, the matrices optimal for the local alignment kernel can be different.ResultsContrary to the local alignment score computed by the Smith-Waterman algorithm, the local alignment kernel is differentiable with respect to the amino acid substitution and its derivative can be computed efficiently by dynamic programming. We optimized the substitution matrix by classical gradient descent by setting an objective function that measures how well the local alignment kernel discriminates homologs from non-homologs in the COG database. The local alignment kernel exhibits better performance when it uses the matrices and gap parameters optimized by this procedure than when it uses the matrices optimized for the Smith-Waterman algorithm. Furthermore, the matrices and gap parameters optimized for the local alignment kernel can also be used successfully by the Smith-Waterman algorithm.ConclusionThis optimization procedure leads to useful substitution matrices, both for the local alignment kernel and the Smith-Waterman algorithm. The best performance for homology detection is obtained by the local alignment kernel.

Highlights

  • Detecting remote homologies by direct comparison of protein sequences remains a challenging task

  • We previously developed a score to compare protein sequences, called the local alignment kernel (LA kernel) [7], which in combination with a support vector machine could detect remote homology better than several state-ofthe-art methods, including the Smith-Waterman (SW) algorithm [8], in a benchmark experiment based on the SCOP database

  • Results on independent test sets Performances of algorithms over the Cluster of Orthologous Group (COG) test set were evaluated for both the LA kernel and the SW score in combination with both BLOSUM62LAOPT and BLOSUM62SWOPT

Read more

Summary

Introduction

Detecting remote homologies by direct comparison of protein sequences remains a challenging task. We had previously developed a similarity score between sequences, called a local alignment kernel, that exhibits good performance for this task in combination with a support vector machine. BLAST [1] and PSI-BLAST [2] are widely used for this task, from wet biologists to bioinformaticians Thanks to those tools, more than half of the newly identified protein sequences are nowadays recognized as having homologs [3]. BMC Bioinformatics 2006, 7:246 http://www.biomedcentral.com/1471-2105/7/246 ters of the algorithm to detect homology Following this strategy, we previously developed a score to compare protein sequences, called the local alignment kernel (LA kernel) [7], which in combination with a support vector machine could detect remote homology better than several state-ofthe-art methods, including the Smith-Waterman (SW) algorithm [8], in a benchmark experiment based on the SCOP database. It bears similarities to the AAS algorithm [9], Hybrid Alignment algorithm [10] and BALSA algorithm [11] for sequence comparison, in the sense that all of these algorithms compute a summation of the scores over all possible local alignments (using a forward algorithm), instead of computing the score of only the best alignment (using the Viterbi algorithm), as the SW algorithm does

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call