Exploring biodiversity requires meticulous species identification within specific environments. Environmental DNA (eDNA) refers to DNA extracted from environmental samples (Taberlet et al. 2018). Using metabarcoding with eDNA enables detailed taxonomic inventories (Valentini et al. 2009, Haderlé et al. 2024). After sample collection, DNA is extracted, the targeted barcode is amplified, and obtained amplicons are sequenced, producing millions of sequences. Bioinformatics processing is then a particularly important step in the analysis used to clean, structure sequences and complete taxonomic assignments. However, computerized protocols face limitations, including taxonomic misidentifications due to intraspecific variation, such as in the Delphininae (Alfonsi et al. 2013), or incomplete reference databases (e.g. Hleap et al. 2021, Meglécz 2023). Relying on custom databases may help but can miss unexpected taxa range shifts (Gold et al. 2021, Jung et al. 2016). Including ecological data (Coissac et al. 2012, Deiner et al. 2017, Blackman et al. 2023) or integrating multiple databases (Bourret et al. 2023) enhances accuracy, and expert validation remains essential (Clarke et al. 2017, Porter and Hajibabaei 2018). Our study aims to develop a taxonomic assignment protocol for vertebrate Molecular Operational Taxonomic Units (MOTUs)/Amplicon Sequence Variants (ASVs), called VeTAPRH (Vertebrates Taxonomic Assignment Protocol), to verify automated assignments from metabarcoding pipelines. This protocol generates successive species lists, first built on the basis of molecularly similar taxa, then successively refined on the basis of taxonomic data and plausibility of local distribution, compiled from specialized databases. By cross-referencing these criteria, we establish a list of hypothetical taxa for each MOTU, including those absent from reference databases. Initially presented as a flowchart, this protocol will lay the groundwork for a digital tool to support taxonomists.
Read full abstract