VTAM: A robust pipeline for validating metabarcoding data using optimized parameters based on internal controls

Emese Meglécz,Emmanuel Corse,Aitor González,Vincent Dubut

doi:10.3897/aca.4.e64659

Abstract

Metabarcoding has become a powerful approach to study biodiversity from environmental samples but it is still prone to some pitfalls. Several papers have called for good practice in study design, data production and analyses to ensure repeatability and comparability between studies. Notably, the importance of mock community samples, negative controls, and replicates is frequently highlighted (Alberdi et al. 2018, O'Rourke et al. 2020). However, their use in bioinformatics pipelines is often limited to post hoc verification of expectations by the user. Indeed, one of the biggest challenges in metabarcoding analyses is to take into account the trade-off between false positive (FP) and false negative (FN) occurrences. We thus developed the VTAM (Validation and Taxonomic Assignation of Metabarcoding data) pipeline, which is the first tool to use explicitly the negative control and mock samples to find optimal parameters to minimize false positive and negative occurrences. In addition, VTAM addresses all known technical error types including tag-jumps, repeatability among replicates, and also it is able to integrate more than one overlapping markers to further minimize false negative occurrences. In order to evaluate VTAM, we compared it with two other pipelines: a pipeline based on DADA2 (Callahan et al. 2016) and LULU (Frøslev et al. 2017), and a pipeline based on OBITools3 (Boyer et al. 2016) and metabaR (Zinger et al. 2020). Two datasets from fish and bat diet studies were analysed with the three different pipelines. Based on mock and negative samples, we demonstrate that VTAM showed the best precision for mock samples in both datasets, while specificity in negative controls were comparable among the three pipelines (Fig. 1). VTAM therefore constitutes a complete pipeline to filter and validate metabarcoding data, from raw FASTQ data to Amplicon Sequence Variant tables with taxonomic assignments. Our pipeline aggregates a series of features rarely grouped in a single pipeline and performs a non-arbitrary parameter optimization based on internal control samples to generate conservative but informative metabarcoding datasets. We believe VTAM provides a very valuable tool for the validation of metabarcoding data, which is essential for conducting robust analyses of biodiversity.

Highlights

Metabarcoding has become a powerful approach to study biodiversity from environmental samples but it is still prone to some pitfalls
Corresponding author: Emese Meglécz Received: 19 Feb 2021 | Published: 04 Mar 2021 Citation: Meglécz E, Dubut V, Corse E, González A (2021) VTAM: A robust pipeline for validating metabarcoding data using optimized parameters based on internal controls
One of the biggest challenges in metabarcoding analyses is to take into account the trade-off between false positive (FP) and false negative (FN) occurrences

Summary

Introduction

Metabarcoding has become a powerful approach to study biodiversity from environmental samples but it is still prone to some pitfalls. Corresponding author: Emese Meglécz (emese.meglecz@imbe.fr) Received: 19 Feb 2021 | Published: 04 Mar 2021 Citation: Meglécz E, Dubut V, Corse E, González A (2021) VTAM: A robust pipeline for validating metabarcoding data using optimized parameters based on internal controls. Several papers have called for good practice in study design, data production and analyses to ensure repeatability and comparability between studies.

Results

Conclusion