Abstract

BackgroundAlignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis. However, multiple alignment represents a computationally difficult problem. For protein-coding DNA sequences, it is more advantageous in terms of both speed and accuracy to align the amino-acid sequences specified by the DNA sequences rather than the DNA sequences themselves. Many implementations making use of this concept of "translated alignments" are incomplete in the sense that they require the user to manually translate the DNA sequences and to perform the amino-acid alignment. As such, they are not well suited to large-scale automated alignments of large and/or numerous DNA data sets.ResultstransAlign is an open-source Perl script that aligns protein-coding DNA sequences via their amino-acid translations to take advantage of the superior multiple-alignment capabilities and speed of an amino-acid alignment. It operates by translating each DNA sequence into its corresponding amino-acid sequence, passing the entire matrix to ClustalW for alignment, and then back-translating the resulting amino-acid alignment to derive the aligned DNA sequences. In the translation step, transAlign determines the optimal orientation and reading frame for each DNA sequence according to the desired genetic code. It also checks for apparent frame shifts in the DNA sequences and can handle frame-shifted sequences in one of three ways (delete, align as amino acids regardless, or profile align as DNA). As a set of comparative benchmarks derived from six protein-coding genes for mammals shows, the strategy implemented in transAlign always improves the speed and usually the apparent accuracy of the alignment of protein-coding DNA sequences.ConclusiontransAlign represents one of few full and cross-platform implementations of the concept of translated alignments. Both the advantages accruing from performing a translated alignment and the suite of user-definable options available in the program mean that transAlign is ideally suited for large-scale automated alignments of very large and/or very numerous protein-coding DNA data sets. However, the good performance offered by the program also translates to the alignment of any set of protein-coding sequences. transAlign, including the source code, is freely available at http://www.tierzucht.tum.de/Bininda-Emonds/ (under "Programs").

Highlights

  • Alignments of homologous DNA sequences are crucial for comparative genomics and phylogenetic analysis

  • To test the potential performance advantages offered by a translated alignment of protein-coding DNA sequences, six mammalian coding genes were each aligned either directly using ClustalW or via their amino-acid translations using transAlign

  • TransAlign, in addition to being cross-platform, includes a diverse suite of user-definable options relating to the processing of the DNA sequence data, its alignment as amino-acid data, and subsequent back-translation into aligned DNA data

Read more

Summary

Results and discussion

To test the potential performance advantages offered by a translated alignment of protein-coding DNA sequences, six mammalian coding genes were each aligned either directly using ClustalW (default parameters) or via their amino-acid translations using transAlign (genetic code specified, otherwise default parameters). 1002.16 matching nucleotides score +1; mismatches score +0) between the same sequence in the test alignment and the manually produced one. These values were averaged for each data set to essentially reveal how many nucleotides, on average, were correctly aligned. The benchmark data (Table 1) show that transAlign delivers alignments of often superior quality compared to a DNA alignment of the same data set, but always with a significant savings in time. The same advantages would apply to these programs, such that alignments for the benchmark data sets could be obtained in even less time

Conclusion
10. Morgenstern B
Seattle
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call