MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures.

Samuele Girotto,Cinzia Pizzi,Matteo Comin

doi:10.1093/bioinformatics/btw466

Abstract

Sequencing technologies allow the sequencing of microbial communities directly from the environment without prior culturing. Taxonomic analysis of microbial communities, a process referred to as binning, is one of the most challenging tasks when analyzing metagenomic reads data. The major problems are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species and the limitations due to short read lengths and sequencing errors. MetaProb is a novel assembly-assisted tool for unsupervised metagenomic binning. The novelty of MetaProb derives from solving a few important problems: how to divide reads into groups of independent reads, so that k-mer frequencies are not overestimated; how to convert k-mer counts into probabilistic sequence signatures, that will correct for variable distribution of k-mers, and for unbalanced groups of reads, in order to produce better estimates of the underlying genome statistic; how to estimate the number of species in a dataset. We show that MetaProb is more accurate and efficient than other state-of-the-art tools in binning both short reads datasets (F-measure 0.87) and long reads datasets (F-measure 0.97) for various abundance ratios. Also, the estimation of the number of species is more accurate than MetaCluster. On a real human stool dataset MetaProb identifies the most predominant species, in line with previous human gut studies. https://bitbucket.org/samu661/metaprob cinzia.pizzi@dei.unipd.it or comin@dei.unipd.it Supplementary data are available at Bioinformatics online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bioinformatics	Publication Date: Aug 29, 2016
Citations: 66	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics

Lead the way for us

Similar Papers

Recruiting Environmental Genomes from Metagenomes
Naseer Sangwan ... Pushp Lata
Indian Journal of Microbiology | VOL. 52
Naseer Sangwan, et. al.Naseer Sangwan ... Pushp Lata
09 Feb 2012
Indian Journal of Microbiology | VOL. 52

Separating metagenomic short reads into genomes via clustering
Olga Tanaseichuk ... James Borneman
Algorithms for Molecular Biology | VOL. 7
Olga Tanaseichuk, et. al.Olga Tanaseichuk ... James Borneman
26 Sep 2012
Algorithms for Molecular Biology | VOL. 7

Genometa - A Fast and Accurate Classifier for Short Metagenomic Shotgun Reads
Colin F Davenport ... Nils Beckmann
PLoS ONE | VOL. 7
Colin F Davenport, et. al.Colin F Davenport ... Nils Beckmann
21 Aug 2012
PLoS ONE | VOL. 7

Separating Metagenomic Short Reads into Genomes via Clustering
Olga Tanaseichuk ... Tao Jiang
-
Olga Tanaseichuk, et. al.Olga Tanaseichuk ... Tao Jiang
01 Jan 2010
01 Jan 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures.

Abstract

Talk to us

Similar Papers

More From: Bioinformatics