Abstract

BackgroundThe composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa.Methodology/Principal FindingsThree datasets, matK+rbcL dataset with 2,083 sequences, matK+rbcL dataset with 397 sequences and matK+rbcL+trnH-psbA dataset with 397 sequences, were tested. We showed that the success rates of grouping sequences at the genus/species level based on this modified CV approach are always higher than those based on the traditional K2P/NJ method. For the matK+rbcL datasets, the modified CV approach outperformed the K2P-NJ approach by 7.9% in both the 2,083-sequence and 397-sequence datasets, and for the matK+rbcL+trnH-psbA dataset, the CV approach outperformed the traditional approach by 16.7%.ConclusionsWe conclude that the modified CV approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding. Source code, implemented in C++ and supported on MS Windows, is freely available for download at http://math.xtu.edu.cn/myphp/math/research/source/Barcode_source_codes.zip.

Highlights

  • The mitochondrial cytochrome c oxidase subunit I (COI) has been proposed as the ‘‘DNA barcode’’ region for species identification in the animal kingdom [1,2]

  • We conclude that the modified composition vector (CV) approach is an efficient method for analyzing large multi-gene datasets for plant DNA barcoding

  • The objective of this study was to evaluate how well the modified CV method could handle the multi-locus DNA barcoding datasets for plants, and the results showed that the success rates of grouping sequences at the genus/species level based on the modified CV approach are always higher than those based on the traditional analytical method

Read more

Summary

Introduction

The mitochondrial cytochrome c oxidase subunit I (COI) has been proposed as the ‘‘DNA barcode’’ region for species identification in the animal kingdom [1,2]. Kress et al [9] suggested that the trnH-psbA spacer region should be added as the third core DNA barcoding marker because this three-locus marker could provide a ‘‘better estimate of species identity’’, and it is very easy to be amplified by PCR across terrestrial plants using a pair of universal primers. This matK+rbcL+trnH-psbA combination, with two coding regions plus one non-coding region, seems to be the most promising DNA barcoding strategy to discriminate terrestrial plant species up to date. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for each pair of taxa

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.