MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample

Yi Wang,Francis Y.L Chin,S.M Yiu,Henry C.M Leung

doi:10.1093/bioinformatics/bts397

Yi Wang, Francis Y.L Chin + Show 2 more

Open Access

PDF Available

https://doi.org/10.1093/bioinformatics/bts397

Copy DOI

Export

Save

Cite

Journal: Bioinformatics	Publication Date: Sep 3, 2012
Citations: 120	License type: CC BY 3.0

Affiliation: University of Hong Kong

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Motivation: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable.Results: We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6× to 10×) and can achieve better sensitivity and specificity using less memory and running time.Availability: http://i.cs.hku.hk/~alse/MetaCluster/Contact: chin@cs.hku.hk

Highlights

Metagenomics is the study of genomes of multiple species from environmental samples, such as soil, sea water and the human gut
MetaCluster 4.0, AbundanceBin and TOSS. (The software tool of TOSS was obtained through a private communication with the authors of the article.) are the latest unsupervised binning tools for next-generation sequencing (NGS) reads
As MetaCluster 4.0 outperforms AbundanceBin and TOSS in many situations (Wang et al, 2012), we mainly compare the performances of MetaCluster 5.0 and MetaCluster 4.0

Summary

Introduction

Metagenomics is the study of genomes of multiple species from environmental samples, such as soil, sea water and the human gut. An important step in metagenomic analysis is grouping reads from similar species together, which is known as binning. Supervised methods (Brady and Salzberg, 2009; McHardy et al, 2006) align reads to known genomes and group reads aligned to similar genomes together. Instead of aligning reads to known genomes directly, some semi-supervised methods use taxonomic markers [e.g. recA, rpoB and 16S rRNA (Cole et al, 2005)] to classify reads into different groups. Since only a small part (

Methods

Results

Conclusion