Abstract

Abstract In recent years, large‐scale DNA barcoding campaigns have generated an enormous amount of COI barcodes, which are usually stored in NCBI's GenBank and the official Barcode of Life database (BOLD). BOLD data are generally associated with more detailed and better curated meta‐data, because a great proportion is based on expert‐verified and vouchered material, accessible in public collections. In the course of the initiative German Barcode of Life data were generated for the reference library of 2,846 species of Coleoptera from 13,516 individuals. Confronted with the high effort associated with the identification, verification and data validation, a bioinformatic pipeline, “TaxCI” was developed that (1) identifies taxonomic inconsistencies in a given tree topology (optionally including a reference dataset), (2) discriminates between different cases of incongruence in order to identify contamination or misidentified specimens, (3) graphically marks those cases in the tree, which finally can be checked again and, if needed, corrected or removed from the dataset. For this, “TaxCI” may use DNA‐based species delimitations from other approaches (e.g. mPTP) or may perform implemented threshold‐based clustering. The data‐processing pipeline was tested on a newly generated set of barcodes, using the available BOLD records as a reference. A data revision based on the first run of the TaxCI tool resulted in the second TaxCI analysis in a taxonomic match ratio very similar to the one recorded from the reference set (92% vs. 94%). The revised dataset improved by nearly 20% through this procedure compared to the original, uncorrected one. Overall, the new processing pipeline for DNA barcode data allows for the rapid and easy identification of inconsistencies in large datasets, which can be dealt with before submitting them to public data repositories like BOLD or GenBank. Ultimately, this will increase the quality of submitted data and the speed of data submission, while primarily avoiding the deterioration of the accuracy of the data repositories due to ambiguously identified or contaminated specimens.

Highlights

  • DNA barcoding can provide an efficient tool for rapid biodiversity assessments, because it meets needs for rapid and reproducible specimen identification in the era of massive habitat destruction, biodiversity loss, and climate change (Hebert & Gregory, 2005; Valentini, Pompanon, & Taberlet, 2009)

  • Using standardized genetic markers in DNA barcoding allows connecting the identities of different life stages such as eggs, larvae or adults—often a major difficulty in morphology-­based taxonomy (e.g. Ahrens, Monaghan, & Vogler, 2007; Etzler, Wanner, Morales-­Rodriguez, & Ivie, 2014; Freitag, 2013; García-­Robledo, Kuprewicz, Staines, Kress, & Erwin, 2013; Šipek & Ahrens, 2011)

  • Barcoding has been successfully applied to a vast number of taxa in many different geographic regions

Read more

Summary

Introduction

DNA barcoding can provide an efficient tool for rapid biodiversity assessments, because it meets needs for rapid and reproducible specimen identification in the era of massive habitat destruction, biodiversity loss, and climate change (Hebert & Gregory, 2005; Valentini, Pompanon, & Taberlet, 2009). Since the early days of DNA barcoding, barcodes have been used for direct estimation of species boundaries (Carstens, Pelletier, Reid, & Satler, 2013; Meier, Shiyang, Vaidya, & Ng, 2006; Pons et al, 2006; Puillandre, Lambert, Brouillet, & Achaz, 2012; Ratnasingham & Hebert, 2013; Templeton, 2001; Zhang, Kapli, Pavlidis, & Stamatakis, 2013) Most of these methods attempt to infer species boundaries from the discontinuum between intraspecific and interspecific sequence variation, either visible as a “barcode gap,” or as a shift in branching rates (GMYC) or number of substitutions per branch (PTP). Bergsten et al (2012) revealed increasing difficulties in specimen determination using DNA barcodes when scaling up the geographic scope from local or regional to continental focus They found that in diving beetles (Dytiscidae) a minimum of 70 specimens needs to be analysed per species to sample 95% of its intraspecific variation. Building a proper barcode reference library that sufficiently reflects intraspecific variation and is able to correctly detect species boundaries is expected to be an elaborate procedure

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call