Abstract

BackgroundPhylogenetic analysis of large, multiple-gene datasets, assembled from public sequence databases, is rapidly becoming a popular way to approach difficult phylogenetic problems. Supermatrices (concatenated multiple sequence alignments of multiple genes) can yield more phylogenetic signal than individual genes. However, manually assembling such datasets for a large taxonomic group is time-consuming and error-prone. Additionally, sequence curation, alignment and assessment of the results of phylogenetic analysis are made particularly difficult by the potential for a given gene in a given species to be unrepresented, or to be represented by multiple or partial sequences. We have developed a software package, TaxMan, that largely automates the processes of sequence acquisition, consensus building, alignment and taxon selection to facilitate this type of phylogenetic study.ResultsTaxMan uses freely available tools to allow rapid assembly, storage and analysis of large, aligned DNA and protein sequence datasets for user-defined sets of species and genes. The user provides GenBank format files and a list of gene names and synonyms for the loci to analyse. Sequences are extracted from the GenBank files on the basis of annotation and sequence similarity. Consensus sequences are built automatically. Alignment is carried out (where possible, at the protein level) and aligned sequences are stored in a database. TaxMan can automatically determine the best subset of taxa to examine phylogeny at a given taxonomic level. By using the stored aligned sequences, large concatenated multiple sequence alignments can be generated rapidly for a subset and output in analysis-ready file formats. Trees resulting from phylogenetic analysis can be stored and compared with a reference taxonomy.ConclusionTaxMan allows rapid automated assembly of a multigene datasets of aligned sequences for large taxonomic groups. By extracting sequences on the basis of both annotation and BLAST similarity, it ensures that all available sequence data can be brought to bear on a phylogenetic problem, but remains fast enough to cope with many thousands of records. By automatically assisting in the selection of the best subset of taxa to address a particular phylogenetic problem, TaxMan greatly speeds up the process of generating multiple sequence alignments for phylogenetic analysis. Our results indicate that an automated phylogenetic workbench can be a useful tool when correctly guided by user knowledge.

Highlights

  • Phylogenetic analysis of large, multiple-gene datasets, assembled from public sequence databases, is rapidly becoming a popular way to approach difficult phylogenetic problems

  • By combining the phylogenetic signal from multiple genes, clades can be recovered that are not recovered under analysis of any of the individual genes

  • TaxMan reduces this curatorial burden by automatically assembling and storing large aligned sequence datasets, and storing metadata and trees resulting from phylogenetic analysis

Read more

Summary

Introduction

Phylogenetic analysis of large, multiple-gene datasets, assembled from public sequence databases, is rapidly becoming a popular way to approach difficult phylogenetic problems. Supermatrices (concatenated multiple sequence alignments of multiple genes) can yield more phylogenetic signal than individual genes. Manually assembling such datasets for a large taxonomic group is time-consuming and error-prone. There has been much interest in the use of large, concatenated multiple sequence alignments ('supermatrices') for phylogenetic analysis [1,2] Such datasets have been shown to be useful in resolving difficult phylogenetic questions with a high degree of confidence. The forest of trees resulting from subsequent phylogenetic analysis must be associated with the relevant datasets, analysis parameters and confidence metrics With these tasks in mind we have developed an integrated solution, TaxMan. TaxMan reduces this curatorial burden by automatically assembling and storing large aligned sequence datasets, and storing metadata and trees resulting from phylogenetic analysis. Because of the high level of automation offered by TaxMan, datasets can be rebuilt rapidly to include new sequence data

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call