AbstractMetabarcoding of environmental DNA (eDNA) is a powerful tool for describing biodiversity, such as finding keystone species or detecting invasive species in environmental samples. Continuous improvements in the method and the advances in sequencing platforms over the last decade have meant this approach is now widely used in biodiversity sciences and biomonitoring. For its general use, the method hinges on a correct identification of taxa. However, past studies have shown how this crucially depends on important decisions during sampling, sample processing, and subsequent handling of sequencing data. With no clear consensus as to the best practice, particularly the latter has led to varied bioinformatic approaches and recommendations for data preparation and taxonomic identification. In this study, using a large freshwater fish eDNA sequence dataset, we compared the frequently used zero‐radius Operational Taxonomic Unit (zOTU) approach of our raw reads and assigned it taxonomically (i) in combination with publicly available reference sequences (open databases) or (ii) with an OSU (Operational Sequence Units) database approach, using a curated database of reference sequences generated from specimen barcoding (closed database). We show both approaches gave comparable results for common species. However, the commonalities between the approaches decreased with read abundance and were thus less reliable and not comparable for rare species. The success of the zOTU approach depended on the suitability, rather than the size, of a reference database. Contrastingly, the OSU approach used reliable DNA sequences and thus often enabled species‐level identifications, yet this resolution decreased with the recent phylogenetic age of the species. We show the need to include target group coverage, outgroups and full taxonomic annotation in reference databases to avoid misleading annotations that can occur when using short amplicon sizes as commonly used in eDNA metabarcoding studies. Finally, we make general suggestions to improve the construction and use of reference databases for metabarcoding studies in the future.
Read full abstract