Multiple sequence alignments of partially coding nucleic acid sequences

Roman R Stocsits,Ivo L Hofacker,Peter F Stadler,Claudia Fried

doi:10.1186/1471-2105-6-160

Roman R Stocsits, Ivo L Hofacker + Show 2 more

Open Access

PDF Available

https://doi.org/10.1186/1471-2105-6-160

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundHigh quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data. Nucleic acid sequences, however, exhibit a much larger sequence heterogeneity compared to their encoded protein sequences due to the redundancy of the genetic code. It is desirable, therefore, to make use of the amino acid sequence when aligning coding nucleic acid sequences. In many cases, however, only a part of the sequence of interest is translated. On the other hand, overlapping reading frames may encode multiple alternative proteins, possibly with intermittent non-coding parts. Examples are, in particular, RNA virus genomes.ResultsThe standard scoring scheme for nucleic acid alignments can be extended to incorporate simultaneously information on translation products in one or more reading frames. Here we present a multiple alignment tool, codaln, that implements a combined nucleic acid plus amino acid scoring model for pairwise and progressive multiple alignments that allows arbitrary weighting for almost all scoring parameters. Resource requirements of codaln are comparable with those of standard tools such as ClustalW.ConclusionWe demonstrate the applicability of codaln to various biologically relevant types of sequences (bacteriophage Levivirus and Vertebrate Hox clusters) and show that the combination of nucleic acid and amino acid sequence information leads to improved alignments. These, in turn, increase the performance of analysis tools that depend strictly on good input alignments such as methods for detecting conserved RNA secondary structure elements.

Highlights

High quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data
Multiple sequence alignments are a crucial prerequisite for a diverse set of methods ranging from the reconstruction of phylogenies and the quantification of adaptive evolution, to the detection of conserved RNA secondary structures and protein motifs
Examples have been found in prokaryotic [17,18] and even in eukaryotic genomes [19,20]. In this contribution we describe a progressive alignment tool that implements an extended scoring scheme to incorporate simultaneously information on translation products in one or more ([partly] overlapping) reading frames which allows the user to combine all information from both the nucleic acid and amino acid sequences

Summary

Results

More plausible alignments Not surprisingly, we observe that codaln multiple alignments of coding DNA sequences have a much larger fraction of gaps with a length divisible by three than ClustalW multiple alignments This is the desired effect of including amino acid-based scoring contributions since it reduces biologically implausible frameshifts. While codaln produces a significantly higher fraction of gaps that are a multiple of 3 and correctly aligns the coding sequences in both exons, ClustalW only treats exon 2 correctly, which is highly conserved on the level of nucleic acids. At the 5'-terminal end of the Levivirus sequences we detect a short GC-rich hairpin(tetraloop) adjacent to an unpaired GGG element, see Fig. 6 This feature is probably the analogon to the recognition signal site for the RNA replicase in Alloleviviruses. The Qβ replicase amplifies RNA templates autocatalytically with high efficiency, and the recognition element, consisting of a hairpin and a short unpaired region at the 5'-terminus, is essential for recognition [36,37]

Conclusion

Background

Discussion

A C CCGCGCGCGG G

14. Simmonds P

22. Hein J

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jun 28, 2005
Citations: 28	License type: cc-by

R Discovery Prime

Multiple sequence alignments of partially coding nucleic acid sequences

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A novel method for alignment of two nucleic acid sequences using ant colony optimization and genetic algorithms
S.R Jangam ... N Chakraborti
Applied Soft Computing Journal | VOL. 7
S.R Jangam, et. al.S.R Jangam ... N Chakraborti
05 Dec 2006
Applied Soft Computing Journal | VOL. 7

Chapter 5 - Using Genetic Algorithms for Pairwise and Multiple Sequence Alignments
Cédric Notredame
Evolutionary Computation in Bioinformatics | VOL. -
Cédric NotredameCédric Notredame
01 Jan 2003
Evolutionary Computation in Bioinformatics | VOL. -

Glossary
Fran Lewitter ... Janet M Thornton
Trends in Biotechnology | VOL. 16
Fran Lewitter, et. al.Fran Lewitter ... Janet M Thornton
01 Nov 1998
Trends in Biotechnology | VOL. 16

An Improved Tool for Molecular Biology
Allen B Rawitch
Science | VOL. 288
Allen B RawitchAllen B Rawitch
21 Apr 2000
Science | VOL. 288

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Multiple sequence alignments of partially coding nucleic acid sequences

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics