Abstract
The use of DNA metabarcoding to characterise the biodiversity of environmental and community samples has exploded in recent years. However, taxonomic inferences from these studies are contingent on the quality and completeness of the sequence reference database used to characterise sample species-composition. In response, studies often develop custom reference databases to improve species assignment. The disadvantage of this approach is that it limits the potential for database re-use, and the transferability of inferences across studies. Here, we present the MARine Eukaryote Species (MARES) reference database for use in marine metabarcoding studies, created using a transparent and reproducible pipeline. MARES includes all COI sequences available in GenBank and BOLD for marine taxa, unified into a single taxonomy. Our pipeline facilitates the curation of sequences, synonymization of taxonomic identifiers used by different repositories, and formatting these data for use in taxonomic assignment tools. Overall, MARES provides a benchmark COI reference database for marine eukaryotes, and a standardised pipeline for (re)producing reference databases enabling integration and fair comparison of marine DNA metabarcoding results.
Highlights
Background & SummaryDNA metabarcoding has emerged as a powerful tool for quantifying biodiversity using genetic sequences[1,2]
Given the impact that the choice of reference databases can have on metabarcoding study inferences, there are numerous campaigns to compile publicly available barcode libraries for specific groups and geographic locations
We present the MARine Eukaryote Species (MARES) database, providing reference sequences of the cytochrome oxidase 1 (COI) gene region for a large diversity of taxa found in marine ecosystems with standardised and curated taxonomic identifiers
Summary
DNA metabarcoding has emerged as a powerful tool for quantifying biodiversity using genetic sequences[1,2]. Given the impact that the choice of reference databases can have on metabarcoding study inferences, there are numerous campaigns to compile publicly available barcode libraries for specific groups (e.g. photosynthetic eukaryotes, PhytoREF6; arthropods[5]; fungus, UNITE7) and geographic locations (e.g. aquatic life in European countries[8], freshwater macroinvertebrates of Australia[9]). The use of such standardised reference databases for taxonomic assignment avoids possible biases in species determination introduced by the choice of reference database, thereby allowing unbiased comparisons among studies. The MARES pipeline enables users to participate in the decisions that need to be made in generating a sequence reference database that will have downstream consequences on their biodiversity inferences
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.