Abstract

Indels in DNA sequences frequently affect more than a single nucleotide, creating problems for alignment, character coding and phylogenetic analysis. However, the size and frequency of multiple-residue indels is not usually tested, and with popular alignment packages their reconstruction is indirectly acheived by reducing the affine (gap extension) cost. We explored the length distribution of indels in intron sequences of the gene Mp20 by modifying the gap opening and gap extension costs. Given a "known" tree for the study group, global homology levels were greatest under low gap cost, with gap extension costs of roughly 0.4-fold the opening cost. Different approaches to gap coding and weighting suggested that taxonomic congruence was correlated with high frequencies of multiple-position indels, with a maximum indel length of 2-5 bp and few indels above 15 bp, but also including a proportion of indels > 100 bp. Only a small minority of indels could be reconstructed as single-position indels. Consequently, tree topologies improved when homologous multinucleotide indels were recoded as binary characters which are otherwise highly homoplastic and weighted characters in single-position coding. In tree-generating alignment procedures as implemented in POY, where gap penalty determines the character weight during tree search, the problem of assigning inappropriately high weight to multiple-residue indels could partly be overcome by setting the extension costs to about 0.4-fold lower than gap opening costs. We conclude that multiple consecutive gap positions are not independent characters and hence methods for parsimony reconstruction of long indels are required. Finally, we also observed a general lack of correlation between taxonomic and character congruence, demonstrating the difficulties of applying congruence criteria to decide among competing alignments. This highlights the value of recent model-based alignment procedures which can implement the statistical distributions of indel size classes, and do not rely on potentially circular strategies for optimizing overall congruence.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call