Dadaist2: A Toolkit to Automate and Simplify Statistical Analysis and Plotting of Metabarcoding Experiments.

Rebecca Ansorge,Andrea Telatin,Stephen A James,Giovanni Birolo

doi:10.3390/ijms22105309

Abstract

The taxonomic composition of microbial communities can be assessed using universal marker amplicon sequencing. The most common taxonomic markers are the 16S rDNA for bacterial communities and the internal transcribed spacer (ITS) region for fungal communities, but various other markers are used for barcoding eukaryotes. A crucial step in the bioinformatic analysis of amplicon sequences is the identification of representative sequences. This can be achieved using a clustering approach or by denoising raw sequencing reads. DADA2 is a widely adopted algorithm, released as an R library, that denoises marker-specific amplicons from next-generation sequencing and produces a set of representative sequences referred to as ‘Amplicon Sequence Variants’ (ASV). Here, we present Dadaist2, a modular pipeline, providing a complete suite for the analysis that ranges from raw sequencing reads to the statistics of numerical ecology. Dadaist2 implements a new approach that is specifically optimised for amplicons with variable lengths, such as the fungal ITS. The pipeline focuses on streamlining the data flow from the command line to R, with multiple options for statistical analysis and plotting, both interactive and automatic.

Highlights

High-throughput amplicon sequencing of taxonomic markers is a cost-effective and widely adopted method for determining the composition of mixed natural communities.In addition to the analysis of microbial communities using standard marker sequences such as 16S rDNA or the fungal internal transcribed spacer (ITS) region, there are applications targeting eukaryotic genes that decipher the composition of environmental, host-associated, or manufactured food-associated microbiomes [1]
The pipeline focuses on streamlining the data flow from the command line to R, with multiple options for statistical analysis and plotting, both interactive and automatic
We present Dadaist2, a command-line-based application that implements the use of DADA2 with a dedicated workflow for ITS profiling (Figure 1) and that focuses on filling the gap between primary and secondary analysis, streamlining the numerical ecology analysis that can be performed manually, or interactively using MicrobiomeAnalyst

Summary

Introduction

High-throughput amplicon sequencing of taxonomic markers is a cost-effective and widely adopted method for determining the composition of mixed natural communities.In addition to the analysis of microbial communities using standard marker sequences such as 16S rDNA or the fungal internal transcribed spacer (ITS) region, there are applications targeting eukaryotic genes (e.g., trnL for chloroplasts, COI for mitochondria, or nuclear 18S rDNA) that decipher the composition of environmental, host-associated, or manufactured food-associated microbiomes [1]. Amplicon sequencing detects the presence of molecular species (barcode sequences) and uses that information to infer the presence of microorganisms under a set of assumptions and caveats. These include technical biases arising from the amplification procedure that are prone to introducing errors and chimeras, making accurate community description and between-study comparisons challenging. The natural occurrence of multiple copies of a marker sequence, such as 16S, in an organism’s genome, further complicates taxonomic affiliation and abundance inference Addressing these caveats, in both laboratory and bioinformatic data processing, often poses a challenge for reproducible and comparable amplicon-based microbiome studies [2]

Methods

Results

Conclusion