Abstract

BackgroundSeveral ways of incorporating indels into phylogenetic analysis have been suggested. Simple indel coding has two strengths: (1) biological realism and (2) efficiency of analysis. In the method, each indel with different start and/or end positions is considered to be a separate character. The presence/absence of these indel characters is then added to the data set.AlgorithmWe have written a program, GapCoder to automate this procedure. The program can input PIR format aligned datasets, find the indels and add the indel-based characters. The output is a NEXUS format file, which includes a table showing what region each indel characters is based on. If regions are excluded from analysis, this table makes it easy to identify the corresponding indel characters for exclusion.DiscussionManual implementation of the simple indel coding method can be very time-consuming, especially in data sets where indels are numerous and/or overlapping. GapCoder automates this method and is therefore particularly useful during procedures where phylogenetic analyses need to be repeated many times, such as when different alignments are being explored or when various taxon or character sets are being explored. GapCoder is currently available for Windows from .

Highlights

  • Several ways of incorporating indels into phylogenetic analysis have been suggested

  • Numerous studies in which indel characters were compared with already established tree topologies have found that these indels are reliable in constructing phylogenies [6,7,8,9,10,11]. It can be very time-consuming to determine character states based on gaps and enter this information into a data matrix by hand

  • Within the DNA sequences, gap characters are coded as missing data, and the gap region characters are placed at the end of each sequence. This method is useful because it does code indels as separate characters and does consider contiguous gap characters as related

Read more

Summary

Discussion

GapCoder has the potential to be useful in phylogenetics, especially in non-protein-coding regions where indels can be as plentiful as substitutions. The output resulting from GapCoder may be used in exploratory analyses of optimal DNA sequence alignment. Such an analysis would likely include GapCoder as part of an objective method with four stages. GapCoder would be used to code the indels into the data matrix. GapCoder is useful when different character sets and/or taxon sets are being explored, such as when different combinations of outgroups are tried This often requires re-aligning the data set for each taxon set; GapCoder allows the indel characters to be quickly added each time. JH designed and wrote the program itself, and did much of the testing Both authors read and approved the final manuscript.

Background
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call