Abstract

The increasing popularity of cytochrome c oxidase subunit 1 (COI) DNA metabarcoding warrants a careful look at the underlying reference databases used to make high-throughput taxonomic assignments. The objectives of this study are to document trends and assess the future usability of COI records for metabarcode identification. The number of COI records deposited to the NCBI nucleotide database has increased by a geometric average of 51% per year, from 8,137 records deposited in 2003 to a cumulative total of ~ 2.5 million by the end of 2017. About half of these records are fully identified to the species rank, 92% are at least 500 bp in length, 74% have a country annotation, and 51% have latitude-longitude annotations. To ensure the future usability of COI records in GenBank we suggest: 1) Improving the geographic representation of COI records, 2) Improving the cross-referencing of COI records in the Barcode of Life Data System and GenBank to facilitate consolidation and incorporation into existing bioinformatic pipelines, 3) Adherence to the minimum information about a marker gene sequence guidelines, and 4) Integrating metabarcodes from eDNA and mixed community studies with existing reference sequences. The growth of COI reference records over the past 15 years has been substantial and is likely to be a resource across many fields for years to come.

Highlights

  • Cytochrome c oxidase subunit 1 (COI) marker gene or DNA barcode sequencing of animals from mixed communities and bulk samples has surged in usage [1]

  • We show the growth of COI records in GenBank from the introduction of COI barcoding in 2003 to 2017

  • The COI records deposited to the National Center for Biotechnology Information (NCBI) nucleotide database increased on average by nearly 51% per year (Fig 1)

Read more

Summary

Introduction

Cytochrome c oxidase subunit 1 (COI) marker gene or DNA barcode sequencing of animals from mixed communities and bulk samples has surged in usage [1]. COI metabarcoding applications include diversity assessments for biomonitoring and conservation [4,5], detection of environmental gradients in ecology and forestry studies [6,7], and diet analysis [8,9]. COI metabarcoding leverages existing COI sequences in databases such as the Barcode of Life Data (BOLD) System as well as the International Sequence Database Collaboration (INSDC) between the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EMBL-EBI), and the DNA Data Bank of Japan (DDJB) [10,11].

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.