MetaCache: context-aware classification of metagenomic reads using minhashing.

André Müller,Andreas Hildebrandt,Christian Hundt,Bertil Schmidt,Thomas Hankeln

doi:10.1093/bioinformatics/btx520

Abstract

Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy. We introduce MetaCache-a novel software for read classification using the big data technique minhashing. Our approach performs context-aware classification of reads by computing representative subsamples of k-mers within both, probed reads and locally constrained regions of the reference genomes. As a result, MetaCache consumes significantly less memory compared to the state-of-the-art read classifiers Kraken and CLARK while achieving highly competitive sensitivity and precision at comparable speed. For example, using NCBI RefSeq draft and completed genomes with a total length of around 140 billion bases as reference, MetaCache's database consumes only 62 GB of memory while both Kraken and CLARK fail to construct their respective databases on a workstation with 512 GB RAM. Our experimental results further show that classification accuracy continuously improves when increasing the amount of utilized reference genome data. MetaCache is open source software written in C ++ and can be downloaded at http://github.com/muellan/metacache. bertil.schmidt@uni-mainz.de. Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MetaCache: context-aware classification of metagenomic reads using minhashing.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Journal: Bioinformatics	Publication Date: Aug 17, 2017
Citations: 50

Similar Papers

Author response: Targeted genomic sequencing with probe capture for discovery and surveillance of coronaviruses in bats
...
-
, et. al. ...
30 Sep 2022
30 Sep 2022

Capturing the Perfect Reference Genome
Andrew S Wiecek
BioTechniques | VOL. 49
Andrew S WiecekAndrew S Wiecek
01 Sep 2010
BioTechniques | VOL. 49

Accelerating metagenomic read classification on CUDA-enabled GPUs
Robin Kobus ... Bertil Schmidt
BMC Bioinformatics | VOL. 18
Robin Kobus, et. al.Robin Kobus ... Bertil Schmidt
03 Jan 2017
BMC Bioinformatics | VOL. 18

Ganon: precise metagenomics classification against large and up-to-date sets of reference sequences.
Vitor C Piro ... Knut Reinert
Bioinformatics | VOL. 36
Vitor C Piro, et. al.Vitor C Piro ... Knut Reinert
01 Jul 2020
Bioinformatics | VOL. 36

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MetaCache: context-aware classification of metagenomic reads using minhashing.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics