Rapid DNA barcoding analysis of large datasets using the composition vector method

Ka Hou Chu,Minli Xu,Chi Pang Li

doi:10.1186/1471-2105-10-s14-s8

Ka Hou Chu, Minli Xu + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-10-s14-s8

Copy DOI

Abstract

BackgroundSequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22:1690). Here, we further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans.ResultsOur results indicate that the grouping of taxa at the genus/species levels based on the CV/NJ approach is invariably consistent with the trees generated by traditional approaches, although in some cases the clustering among higher groups might differ. Furthermore, the CV method is always much faster than the K2P method routinely used in constructing profile trees for DNA barcoding. For instance, the alignment of 754 COI sequences (average length 649 bp) from fishes took more than ten hours to complete, while the whole tree construction process using the CV/NJ method required no more than five minutes on the same computer.ConclusionThe CV method performs well in grouping effectiveness of DNA barcode sequences, as compared to K2P analysis of aligned sequences. It was also able to reduce the time required for analysis by over 15-fold, making it a far superior method for analyzing large datasets. We conclude that the CV method is a fast and reliable method for analyzing large datasets for DNA barcoding purposes.

Highlights

Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes
The composition vector (CV) method was first demonstrated to facilitate the use of rDNA datasets for barcoding purposes since no sequence alignment was necessary [13]
We further demonstrated the power of the CV method in analyzing large DNA barcode datasets, regardless of the type of gene markers used

Summary

Introduction

Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans. The short-term solution is to divide the large barcode dataset into several “sub-projects” with a size limit of 5,000 specimens each for analysis [7]. As an estimated 200,000 additional barcode records will be entered in the database each year [8], the limit of 5,000 specimens for each sub-project will be quickly saturated because closely related taxa (sequences) should not be divided into subsets but preferably analyzed together. The long-term solution is to develop more efficient analytical methods as alternatives or supplements for handling such a large dataset

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Nov 1, 2009
Citations: 63	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Rapid DNA barcoding analysis of large datasets using the composition vector method

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

VIP Barcoding: composition vector‐based software for rapid species identification based on DNA barcoding
Long Fan ... Zu Guo Yu
Molecular Ecology Resources | VOL. 14
Long Fan, et. al.Long Fan ... Zu Guo Yu
07 Mar 2014
Molecular Ecology Resources | VOL. 14

Biodiversity Informatics: the emergence of a field
Indra Neil Sarkar
BMC Bioinformatics | VOL. 10
Indra Neil SarkarIndra Neil Sarkar
01 Nov 2009
BMC Bioinformatics | VOL. 10

An Intelligent System for Searching Genomic Sequences
Vandana Gummuluru ... Su-Shing Chen
-
Vandana Gummuluru, et. al.Vandana Gummuluru ... Su-Shing Chen
01 Oct 2007
01 Oct 2007

Analyzing multi-locus plant barcoding datasets with a composition vector method based on adjustable weighted distance.
Chi Pang Li ... Ka Hou Chu
PloS one | VOL. 7
Chi Pang Li, et. al.Chi Pang Li ... Ka Hou Chu
27 Jul 2012
PloS one | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Rapid DNA barcoding analysis of large datasets using the composition vector method

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics