Swarm v3: towards tera-scale amplicon clustering.

Frédéric Mahé,Micah Dunthorn,Torbjørn Rognes,Alexandros Stamatakis,Colomban De Vargas,Christopher Quince,Lucas Czech,Inanc Birol

doi:10.1093/bioinformatics/btab493

Abstract

MotivationPreviously we presented swarm, an open-source amplicon clustering programme that produces fine-scale molecular operational taxonomic units (OTUs) that are free of arbitrary global clustering thresholds. Here, we present swarm v3 to address issues of contemporary datasets that are growing towards tera-byte sizes.ResultsWhen compared with previous swarm versions, swarm v3 has modernized C++ source code, reduced memory footprint by up to 50%, optimized CPU-usage and multithreading (more than 7 times faster with default parameters), and it has been extensively tested for its robustness and logic.Availability and implementationSource code and binaries are available at https://github.com/torognes/swarm.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

In emerging planetary biology, large-scale amplicon sequencing datasets are used to unravel global ecological and evolutionary patterns within and across biomes and biota (de Vargas et al, 2015; Mahé et al, 2017; Giner et al, 2020)
Motivation: Previously we presented swarm, an open-source amplicon clustering program that produces fine-scale molecular operational taxonomic units (OTUs) that are free of arbitrary global clustering thresholds
Availability: Source code and binaries are available at https://github.com/torognes/swarm Contact: frederic.mahe@cirad.fr Supplementary information: Supplementary data are available at Bioinformatics online

Summary

Introduction

Large-scale amplicon sequencing datasets are used to unravel global ecological and evolutionary patterns within and across biomes and biota (de Vargas et al, 2015; Mahé et al, 2017; Giner et al, 2020). A critical bioinformatics step in the handling of these massive metabarcoding datasets is to cluster the sequencing reads into operational taxonomic units (OTUs). Swarm v1 (Mahé et al, 2014) was introduced as a novel approach to cluster amplicons into OTUs, inspired by previous single-linkage methods such as DOTUR (Schloss & Handelsman, 2005). The key underlying idea of swarm was to use a local, iterative, single-linkage clustering process to group closely related sequences (by default with one difference in their nucleotide sequences, i.e. d = 1). The code could only be executed on GNU/Linux and macOS on x86-64 CPUs. And swarm v2 was multithreaded and fast, its time and memory requirements could become a limiting factor on very large current and future datasets, especially as amplicon sequences become longer.

Code quality and portability

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Jul 9, 2021
Citations: 47	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Swarm v3: towards tera-scale amplicon clustering.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Swarm v2: highly-scalable and high-resolution amplicon clustering
Frédéric Mahé ... Torbjørn Rognes
PeerJ | VOL. 3
Frédéric Mahé, et. al.Frédéric Mahé ... Torbjørn Rognes
10 Dec 2015
PeerJ | VOL. 3

Nitrifier and denitrifier molecular operational taxonomic unit compositions from sites of a freshwater estuary of Chesapeake Bay
Caroline S Fortunato ... David B Carlini
Canadian Journal of Microbiology | VOL. 55
Caroline S Fortunato, et. al.Caroline S Fortunato ... David B Carlini
01 Mar 2009
Canadian Journal of Microbiology | VOL. 55

Fungal ecology catches fire
David S Hibbett ... Paul M Kirk
New Phytologist | VOL. 184
David S Hibbett, et. al.David S Hibbett ... Paul M Kirk
25 Sep 2009
New Phytologist | VOL. 184

Biodiversity of terrestrial algal communities from soil and air-exposed substrates using a molecular approach
Christine Hallmann
-
Christine HallmannChristine Hallmann
21 Feb 2022
21 Feb 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Swarm v3: towards tera-scale amplicon clustering.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics