SE: an algorithm for deriving sequence alignment from a pair of superimposed structures

Chin-Hsien Tai,James J Vincent,Changhoon Kim,Byungkook Lee

doi:10.1186/1471-2105-10-s1-s4

Abstract

BackgroundGenerating sequence alignments from superimposed structures is an important part of many structure comparison programs. The accuracy of the alignment affects structure recognition, classification and possibly function prediction. Many programs use a dynamic programming algorithm to generate the sequence alignment from superimposed structures. However, this procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments. Here we present a new algorithm, Seed Extension (SE), for generating the sequence alignment from a pair of superimposed structures. The SE algorithm first finds "seeds", which are the pairs of residues, one from each structure, that meet certain stringent criteria for being structurally equivalent. Three consecutive seeds form a seed segment, which is extended along the diagonal of the alignment matrix in both directions. Distance and the amino acid type similarity between the residues are used to resolve conflicts that arise during extension of more than one diagonal. The manually curated alignments in the Conserved Domain Database were used as the standard to assess the quality of the sequence alignments.ResultsSE gave an average accuracy of 95.9% over 582 pairs of superimposed proteins tested, while CHIMERA, LSQMAN, and DP extracted from SHEBA, which all use a dynamic programming algorithm, yielded 89.9%, 90.2% and 91.0%, respectively. For pairs of proteins with low sequence or structural similarity, SE produced alignments up to 18% more accurate on average than the next best scoring program. Improvement was most pronounced when the two superimposed structures contained equivalent helices or beta-strands that crossed at an angle. When the SE algorithm was implemented in SHEBA to replace the dynamic programming routine, the alignment accuracy improved by 10% on average for structure pairs with RMSD between 2 and 4 Å. SE also used considerably less CPU time than DP.ConclusionThe Seed Extension algorithm is fast and, without using a gap penalty, produces more accurate sequence alignments from superimposed structures than three other programs tested that use dynamic programming algorithm.

Highlights

Generating sequence alignments from superimposed structures is an important part of many structure comparison programs
While CHIMERA, LSQMAN and DP yielded the average accuracy of 89.9%, 90.2% and 91.0% respectively over the 582 pairs of superimposed proteins, Seed Extension (SE) gave an average fraction of correctly aligned residues (fCAR) of 95.9%
SE algorithm produces more accurate sequence alignments from superimposed structures than the dynamic programming algorithms used in CHIMERA, LSQMAN or SHEBA, especially in pairs of proteins with low sequence or structure similarity

Summary

Introduction

Generating sequence alignments from superimposed structures is an important part of many structure comparison programs. Many programs use a dynamic programming algorithm to generate the sequence alignment from superimposed structures. This procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments. Dynamic programming algorithm [3,4] is a widely used method for the second step Programs such as SSAP [5], STRUCTAL [6], LSQMAN [7], CE [8], MATRAS [9], SHEBA [10], FAST [11] and others [12] use it to generate the alignments.

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jan 1, 2009
Citations: 29	License type: cc-by

R Discovery Prime

R Discovery Prime

SE: an algorithm for deriving sequence alignment from a pair of superimposed structures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Accuracy of structure-based sequence alignment of automatic methods
Changhoon Kim ... Byungkook Lee
BMC Bioinformatics | VOL. 8
Changhoon Kim, et. al.Changhoon Kim ... Byungkook Lee
20 Sep 2007
BMC Bioinformatics | VOL. 8

Iterative refinement of structure-based sequence alignments by Seed Extension
Changhoon Kim ... Byungkook Lee
BMC Bioinformatics | VOL. 10
Changhoon Kim, et. al.Changhoon Kim ... Byungkook Lee
09 Jul 2009
BMC Bioinformatics | VOL. 10

Reduced-Search Dynamic Programming for Linear-Space Sequence Alignment
Hsueh-Foo Lin ... Jun-Ming Liang
-
Hsueh-Foo Lin, et. al. Hsueh-Foo Lin ... Jun-Ming Liang
13 Dec 2004
13 Dec 2004

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids
Jing Li ... Wei Wang
Science in China Series C: Life Sciences | VOL. 50
Jing Li, et. al.Jing Li ... Wei Wang
01 Jun 2007
Science in China Series C: Life Sciences | VOL. 50

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

SE: an algorithm for deriving sequence alignment from a pair of superimposed structures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics