Abstract

BackgroundSequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). Here, we further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans.ResultsOur results indicate that the grouping of taxa at the genus/species levels based on the CV/NJ approach is invariably consistent with the trees generated by traditional approaches, although in some cases the clustering among higher groups might differ. Furthermore, the CV method is always much faster than the K2P method routinely used in constructing profile trees for DNA barcoding. For instance, the alignment of 754 COI sequences (average length 649 bp) from fishes took more than ten hours to complete, while the whole tree construction process using the CV/NJ method required no more than five minutes on the same computer.ConclusionThe CV method performs well in grouping effectiveness of DNA barcode sequences, as compared to K2P analysis of aligned sequences. It was also able to reduce the time required for analysis by over 15-fold, making it a far superior method for analyzing large datasets. We conclude that the CV method is a fast and reliable method for analyzing large datasets for DNA barcoding purposes.

Highlights

  • Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes

  • The composition vector (CV) method was first demonstrated to facilitate the use of rDNA datasets for barcoding purposes since no sequence alignment was necessary [13]

  • We further demonstrated the power of the CV method in analyzing large DNA barcode datasets, regardless of the type of gene markers used

Read more

Summary

Introduction

Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans. The short-term solution is to divide the large barcode dataset into several “sub-projects” with a size limit of 5,000 specimens each for analysis [7]. As an estimated 200,000 additional barcode records will be entered in the database each year [8], the limit of 5,000 specimens for each sub-project will be quickly saturated because closely related taxa (sequences) should not be divided into subsets but preferably analyzed together. The long-term solution is to develop more efficient analytical methods as alternatives or supplements for handling such a large dataset

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call