Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results.

Alejandro Abdala Asbun,Marc A Besseling,Sergio Balzano,Judith D L Van Bleijswijk,Harry J Witte,Laura Villanueva,Julia C Engelmann

doi:10.3389/fgene.2020.489357

Alejandro Abdala Asbun, Marc A Besseling + Show 5 more

Open Access

https://doi.org/10.3389/fgene.2020.489357

Copy DOI

Abstract

Marker gene sequencing of the rRNA operon (16S, 18S, ITS) or cytochrome c oxidase I (CO1) is a popular means to assess microbial communities of the environment, microbiomes associated with plants and animals, as well as communities of multicellular organisms via environmental DNA sequencing. Since this technique is based on sequencing a single gene, or even only parts of a single gene rather than the entire genome, the number of reads needed per sample to assess the microbial community structure is lower than that required for metagenome sequencing. This makes marker gene sequencing affordable to nearly any laboratory. Despite the relative ease and cost-efficiency of data generation, analyzing the resulting sequence data requires computational skills that may go beyond the standard repertoire of a current molecular biologist/ecologist. We have developed Cascabel, a scalable, flexible, and easy-to-use amplicon sequence data analysis pipeline, which uses Snakemake and a combination of existing and newly developed solutions for its computational steps. Cascabel takes the raw data as input and delivers a table of operational taxonomic units (OTUs) or Amplicon Sequence Variants (ASVs) in BIOM and text format and representative sequences. Cascabel is a highly versatile software that allows users to customize several steps of the pipeline, such as selecting from a set of OTU clustering methods or performing ASV analysis. In addition, we designed Cascabel to run in any linux/unix computing environment from desktop computers to computing servers making use of parallel processing if possible. The analyses and results are fully reproducible and documented in an HTML and optional pdf report. Cascabel is freely available at Github: https://github.com/AlejandroAb/CASCABEL.

Highlights

High-throughput sequencing of an omnipresent marker gene, such as the gene coding for the small subunit of the ribosomal RNA (16S for prokaryotes or 18S for eukaryotes) is a cost-efficient means for community profiling that is affordable for nearly every lab
We provide example config files with default parameters for double- and single barcoded paired-end reads for operational taxonomic units (OTUs) and Amplicon Sequence Variants (ASVs) analysis on the github page of Cascabel
Some rules apply to both routes, others only to one of them. This is indicated in the header of the rule in the config file by either “BOTH_WF,” “OTU_WF,” or “ASV_WF.”

Summary

Introduction

High-throughput sequencing of an omnipresent marker gene, such as the gene coding for the small subunit of the ribosomal RNA (16S for prokaryotes or 18S for eukaryotes) is a cost-efficient means for community profiling that is affordable for nearly every lab. Amplicon Analysis With Cascabel the sequencing costs per sample tremendously, and generating massive amounts of data. Amplicon sequencing can be used to investigate active microbial communities based on ribosomal RNA abundance instead of the rRNA gene locus (Massana et al, 2015; Forster et al, 2016). A short fragment of 100–600 nucleotides of the marker gene is amplified by PCR from the DNA extract or cDNA generated from the rRNA extract of the community, and sequenced by high throughput sequencing. Sequences which are not similar enough to any sequence in the database are excluded from downstream analyses

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in genetics	Publication Date: Nov 20, 2020
Citations: 25	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics

Lead the way for us

Similar Papers

Computational Techniques Used for Microbial Diversity Analysis
Dattatray S Mongad ... Nikeeta S Chavan
-
Dattatray S Mongad, et. al.Dattatray S Mongad ... Nikeeta S Chavan
25 Feb 2021
25 Feb 2021

Hawaiian Fungal Amplicon Sequence Variants Reveal Otherwise Hidden Biogeography.
Laura Tipton ... Anthony S Amend
Microbial ecology | VOL. 83
Laura Tipton, et. al.Laura Tipton ... Anthony S Amend
20 Mar 2021
Microbial ecology | VOL. 83

Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms.
Sten Anslan ... Lars Vesterdal
PeerJ | VOL. 9
Sten Anslan, et. al.Sten Anslan ... Lars Vesterdal
30 Sep 2021
PeerJ | VOL. 9

Data processing can mask biology: towards better reporting of fungal barcoding data?
Marc‐André Selosse ... Maarja Öpik
The New phytologist | VOL. 210
Marc‐André Selosse, et. al.Marc‐André Selosse ... Maarja Öpik
28 Jan 2016
The New phytologist | VOL. 210

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cascabel: A Scalable and Versatile Amplicon Sequence Data Analysis Pipeline Delivering Reproducible and Documented Results.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in genetics