Swarm v2: highly-scalable and high-resolution amplicon clustering

Frédéric Mahé,Torbjørn Rognes,Christopher Quince,Colomban De Vargas,Micah Dunthorn

doi:10.7717/peerj.1420

Abstract

Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.

Highlights

Traditional de novo amplicon clustering methods that can handle large high-throughput sequencing datasets (e.g., Edgar, 2010; Ghodsi, Liu & Pop, 2011; Fu et al, 2012) suffer from two fundamental problems
We previously introduced the open source Swarm v1 program that implemented an initial clustering phase written in C++, a breaking phase written in Python (Mahe et al, 2014)
There can be under-grouping of closely related amplicons leading to small operational taxonomic units (OTUs) surrounding a larger OTU. To address this problem in Swarm v2, we introduced a new step—called the fastidious option—to graft low abundant OTUs onto more abundant ones by postulating a linking amplicon (Fig. 1C)

Summary

Introduction

Traditional de novo amplicon clustering methods that can handle large high-throughput sequencing datasets (e.g., Edgar, 2010; Ghodsi, Liu & Pop, 2011; Fu et al, 2012) suffer from two fundamental problems. Comm., 2015) found that in comparison to other clustering methods, Swarm v1 tended to produce relatively more low abundant OTUs; e.g., singletons and doubletons. Swarm v1 and other current de novo algorithms could not cluster today’s largest high-throughout sequencing datasets within a reasonable amount of time (Rideout et al, 2014).

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Dec 10, 2015
Citations: 470	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Swarm v2: highly-scalable and high-resolution amplicon clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Swarm v3: towards tera-scale amplicon clustering.
Frédéric Mahé ... Colomban De Vargas
Bioinformatics | VOL. 38
Frédéric Mahé, et. al.Frédéric Mahé ... Colomban De Vargas
09 Jul 2021
Bioinformatics | VOL. 38

Sequence clustering threshold has little effect on the recovery of microbial community structure.
Synnøve Smebye Botnen ... Håvard Kauserud
Molecular Ecology Resources | VOL. 18
Synnøve Smebye Botnen, et. al.Synnøve Smebye Botnen ... Håvard Kauserud
04 May 2018
Molecular Ecology Resources | VOL. 18

Nitrifier and denitrifier molecular operational taxonomic unit compositions from sites of a freshwater estuary of Chesapeake Bay
Caroline S Fortunato ... Karen L Bushaw-Newton
Canadian Journal of Microbiology | VOL. 55
Caroline S Fortunato, et. al.Caroline S Fortunato ... Karen L Bushaw-Newton
01 Mar 2009
Canadian Journal of Microbiology | VOL. 55

Swarm: robust and fast clustering method for amplicon-based studies.
Frédéric Mahé ... Torbjørn Rognes
PeerJ | VOL. 2
Frédéric Mahé, et. al.Frédéric Mahé ... Torbjørn Rognes
25 Sep 2014
PeerJ | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Swarm v2: highly-scalable and high-resolution amplicon clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ