Fast and sensitive taxonomic assignment to metagenomic contigs.

M Mirdita,M Steinegger,F Breitwieser,J Söding,E Levy Karin

doi:10.1093/bioinformatics/btab184

M Mirdita, M Steinegger + Show 3 more

Open Access

https://doi.org/10.1093/bioinformatics/btab184

Copy DOI

Abstract

SummaryMMseqs2 taxonomy is a new tool to assign taxonomic labels to metagenomic contigs. It extracts all possible protein fragments from each contig, quickly retains those that can contribute to taxonomic annotation, assigns them with robust labels and determines the contig’s taxonomic identity by weighted voting. Its fragment extraction step is suitable for the analysis of all domains of life. MMseqs2 taxonomy is 2–18× faster than state-of-the-art tools and also contains new modules for creating and manipulating taxonomic reference databases as well as reporting and visualizing taxonomic assignments.Availability and implementationMMseqs2 taxonomy is part of the MMseqs2 free open-source software package available for Linux, macOS and Windows at https://mmseqs.com.Supplementary informationSupplementary data are available at Bioinformatics online.

Highlights

Metagenomic studies shine a light on previously unstudied parts of the tree of life
Despite its advantage over existing methods, CAT has limitations: (1) Prodigal was designed for prokaryotes and not eukaryotes [13]; (2) Prodigal runs single-threaded, limiting applicability to metagenomics; (3) CAT’s r parameter determines the cut-off score below each open reading frames (ORFs)’s top-hit above which hits are included in the ORF’s lowest common ancestor (LCA) computation
All 57 SAR RefSeq assemblies and their taxonomic labels were downloaded from NCBI in 08/2020

Summary

INTRODUCTION

Metagenomic studies shine a light on previously unstudied parts of the tree of life. unraveling taxonomic composition accurately and quickly remains a challenge. [12] developed CAT, a tool for taxonomic annotation of contigs based on protein homologies to a reference database. It combines Prodigal [7] for predicting open reading frames (ORFs), DIAMOND [3] to search with the translated ORFs, and logic to aggregate individual ORF annotations. We present MMseqs taxonomy, a novel proteinsearch-based tool for taxonomy assignment to contigs It overcomes the aforementioned limitations by extracting all possible protein fragments, covering the coding repertoire of all domains of life. The hits for the a2bLCA computation are determined automatically, saving the need to tune an equivalent of CAT’s r parameter It outperforms CAT on bacterial and eukaryotic data sets

METHODS

RESULTS

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Mar 18, 2021
Citations: 149	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Fast and sensitive taxonomic assignment to metagenomic contigs.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

EnsembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences.
Dylan Catlett ... Connie Liang
PeerJ | VOL. 9
Dylan Catlett, et. al.Dylan Catlett ... Connie Liang
26 Jul 2021
PeerJ | VOL. 9

Matching Species Names Across Biodiversity Databases: Sources, tools, pitfalls and best practices for taxonomic harmonization
Matthias Grenié ... Alban Sagouis
Biodiversity Information Science and Standards | VOL. 5
Matthias Grenié, et. al.Matthias Grenié ... Alban Sagouis
17 Sep 2021
Biodiversity Information Science and Standards | VOL. 5

Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods.
J Dröge ... A C Mchardy
Bioinformatics | VOL. 31
J Dröge, et. al.J Dröge ... A C Mchardy
10 Nov 2014
Bioinformatics | VOL. 31

FOSSIL
Saed Alrabaee ... Paria Shirani
ACM Transactions on Privacy and Security | VOL. 21
Saed Alrabaee, et. al.Saed Alrabaee ... Paria Shirani
31 Jan 2018
ACM Transactions on Privacy and Security | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast and sensitive taxonomic assignment to metagenomic contigs.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bioinformatics