Abstract
BackgroundThe composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa.Methodology/Principal FindingsThree datasets, matK+rbcL dataset with 2,083 sequences, matK+rbcL dataset with 397 sequences and matK+rbcL+trnH-psbA dataset with 397 sequences, were tested. We showed that the success rates of grouping sequences at the genus/species level based on this modified CV approach are always higher than those based on the traditional K2P/NJ method. For the matK+rbcL datasets, the modified CV approach outperformed the K2P-NJ approach by 7.9% in both the 2,083-sequence and 397-sequence datasets, and for the matK+rbcL+trnH-psbA dataset, the CV approach outperformed the traditional approach by 16.7%.ConclusionsWe conclude that the modified CV approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding. Source code, implemented in C++ and supported on MS Windows, is freely available for download at http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip.
Highlights
The mitochondrial cytochrome c oxidase subunit I (COI) has been proposed as the ‘‘DNA barcode’’ region for species identification in the animal kingdom [1,2]
We conclude that the modified composition vector (CV) approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding
The objective of this study was to evaluate how well the modified CV method could handle the multi-locus DNA barcoding datasets for plants, and the results showed that the success rates of grouping sequences at the genus/species level based on the modified CV approach are always higher than those based on the traditional analytical method
Summary
The mitochondrial cytochrome c oxidase subunit I (COI) has been proposed as the ‘‘DNA barcode’’ region for species identification in the animal kingdom [1,2]. Kress et al [9] suggested that the trnH-psbA spacer region should be added as the third core DNA barcoding marker because this three-locus marker could provide a ‘‘better estimate of species identity’’, and it is very easy to be amplified by PCR across terrestrial plants using a pair of universal primers. This matK+rbcL+trnH-psbA combination, with two coding regions plus one non-coding region, seems to be the most promising DNA barcoding strategy to discriminate terrestrial plant species up to date. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.