Abstract

The accuracy of specimen identification through DNA barcoding and metabarcoding relies on reference libraries containing records with reliable taxonomy and sequence quality. The considerable growth in barcode data requires stringent data curation, especially in taxonomically difficult groups such as marine invertebrates. A major effort in curating marine barcode data in the Barcode of Life Data Systems (BOLD) was undertaken during the 8th International Barcode of Life Conference (Trondheim, Norway, 2019). Major taxonomic groups (crustaceans, echinoderms, molluscs, and polychaetes) were reviewed to identify those which had disagreement between Linnaean names and Barcode Index Numbers (BINs). The records with disagreement were annotated with four tags: a) MIS-ID (misidentified, mislabeled, or contaminated records), b) AMBIG (ambiguous records unresolved with the existing data), c) COMPLEX (species names occurring in multiple BINs), and d) SHARE (barcodes shared between species). A total of 83,712 specimen records corresponding to 7,576 species were reviewed and 39% of the species were tagged (7% MIS-ID, 17% AMBIG, 14% COMPLEX, and 1% SHARE). High percentages (>50%) of AMBIG tags were recorded in gastropods, whereas COMPLEX tags dominated in crustaceans and polychaetes. The high proportion of tagged species reflects either flaws in the barcoding workflow (e.g., misidentification, cross-contamination) or taxonomic difficulties (e.g., synonyms, undescribed species). Although data curation is essential for barcode applications, such manual attempts to examine large datasets are unsustainable and automated solutions are extremely desirable.

Highlights

  • Reference libraries, which are collections of compliant DNA sequences assigned to species, constitute the backbone of species identification systems based on DNA barcoding and metabarcoding, and a critical component in molecular biomonitoring and molecular taxonomy (Weigand et al 2019)

  • Mollusca was by far the largest phylum tackled during the hackathon, it was Taxonomic Group Bivalvia Gastropoda Crustacea Echinodermata Polychaeta Total

  • Polychaeta was the taxonomic group with the lowest number of reviewed records, whereas Bivalvia was the group displaying the lowest number of species, comprising about 10% of the total number of species in the dataset (Table 1, Fig. 2)

Read more

Summary

Introduction

Reference libraries, which are collections of compliant DNA sequences assigned to species, constitute the backbone of species identification systems based on DNA barcoding and metabarcoding, and a critical component in molecular biomonitoring and molecular taxonomy (Weigand et al 2019). The number of DNA sequences and species included in reference libraries has increased dramatically over the last 15 years The ever-growing libraries have been deposited mostly in two large and public molecular databases, namely (i) GenBank (Sayers et al 2021), a repository with data usually released after publication, and (ii) the Barcode of Life Data Systems (BOLD, Ratnasingham and Hebert 2007), a workbench in which data can be validated and analyzed before being released. Additional databases do exist, but they are smaller in size and usually created for specific purposes (e.g., zooplankton identification, Bucklin et al 2021)

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call