Abstract

AbstractAlignment of DNA and protein sequences is a basic tool in the study of evolutionary, structural and functional relationship among macromolecules. Present sequence alignment methods are somewhat error-prone, often producing systematic bias. Errors in sequence alignments sometimes lead to subsequent misinterpretation of evolutionary, structural and functional information in genes, proteins and genomes. In traditional sequence alignment algorithms, alignments of DNA and protein sequences are conducted separately. It has been long believed that the phylogenetic signal disappears more rapidly from DNA sequences than from encoded proteins. It is therefore generally preferable to align sequences at the amino acid level. Here we present a new method—DNA^+Pro^, which aggregates DNA and protein sequences into combined DNA-protein sequences and align them in a combined fashion. We demonstrate that combining sequences improve the quality of multiple sequence alignment and solve practical evolutionary problems in primate immunodeficiency virus proteins and bacterial restriction enzymes. In addition to increased theoretical information contents, the distance estimations are more biological significant in combined alignment than in protein only or DNA only alignments. By integrating information buried separately in DNA and protein sequences, DNA^+Pro^ improves the accuracy of multiple sequence alignment of closely-related proteins and prevents certain errors that may occur in phylogeny analysis using protein only approaches. The DNA^+Pro^ software and the supplementary data are downloadable free of charge from "our website, http://www.dnapluspro.com":http://www.dnapluspro.com.

Highlights

  • Bioinformatics, as well as functional and comparative genomics, seek to discover functional and structural sequence changes leading to genetic differences between species, as well as to provide accurate reconstruction of evolutionary histories of related genes, proteins and genomes

  • All evolutionary, structural or functional studies that rely on sequence analyses require accurate sequence alignments, i.e., the correct identification of homologous nucleotides or amino acids, and the accurate positioning of gaps indicating insertions and deletions

  • Combined DNA-protein sequence improves multiple sequence alignment As shown in Fig 1A, the traditional protein-only alignment aligned by ClustalW, and its combined- or DNA-view reverse-translated by DNA+Pro, suggests that part of the variable (V2) region has a high rate of amino acid and DNA base substitutions

Read more

Summary

Introduction

Bioinformatics, as well as functional and comparative genomics, seek to discover functional and structural sequence changes leading to genetic differences between species, as well as to provide accurate reconstruction of evolutionary histories of related genes, proteins and genomes. All evolutionary, structural or functional studies that rely on sequence analyses require accurate sequence alignments, i.e., the correct identification of homologous nucleotides or amino acids, and the accurate positioning of gaps indicating insertions and deletions. There are quite a few sequence alignment algorithms available, sequence alignment is still highly error-prone. Different sequence alignment tools often lead to drastically different conclusions in phylogenetic analyses, and can support entirely different mechanisms driving evolutionary, structural and functional changes in sequences. We present a new method, DNA+Pro, which combines DNA and protein sequences in a single alignment, prevents some of these errors, improves the quality of sequence alignment and the accuracy of phylogeny analysis

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call