Abstract

BackgroundRecovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. Understanding the roles of these uncultivated populations has broad application in ecology, evolution, biotechnology and medicine. Accurate binning of assembled metagenomic sequences is an essential step in recovering the genomes and understanding microbial functions.ResultsWe have developed a binning algorithm, MaxBin, which automates the binning of assembled metagenomic scaffolds using an expectation-maximization algorithm after the assembly of metagenomic sequencing reads. Binning of simulated metagenomic datasets demonstrated that MaxBin had high levels of accuracy in binning microbial genomes. MaxBin was used to recover genomes from metagenomic data obtained through the Human Microbiome Project, which demonstrated its ability to recover genomes from real metagenomic datasets with variable sequencing coverages. Application of MaxBin to metagenomes obtained from microbial consortia adapted to grow on cellulose allowed genomic analysis of new, uncultivated, cellulolytic bacterial populations, including an abundant myxobacterial population distantly related to Sorangium cellulosum that possessed a much smaller genome (5 MB versus 13 to 14 MB) but has a more extensive set of genes for biomass deconstruction. For the cellulolytic consortia, the MaxBin results were compared to binning using emergent self-organizing maps (ESOMs) and differential coverage binning, demonstrating that it performed comparably to these methods but had distinct advantages in automation, resolution of related genomes and sensitivity.ConclusionsThe automatic binning software that we developed successfully classifies assembled sequences in metagenomic datasets into recovered individual genomes. The isolation of dozens of species in cellulolytic microbial consortia, including a novel species of myxobacteria that has the smallest genome among all sequenced aerobic myxobacteria, was easily achieved using the binning software. This work demonstrates that the processes required for recovering genomes from assembled metagenomic datasets can be readily automated, an important advance in understanding the metabolic potential of microbes in natural environments. MaxBin is available at https://sourceforge.net/projects/maxbin/.

Highlights

  • Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems

  • Testing MaxBin on simulated metagenomes MaxBin has been designed as an automated metagenomic binning software, which allows binning of assembled metagenomic scaffolds after the assembly of metagenomic sequencing reads with minimal human intervention

  • MaxBin was initially tested by binning several simulated metagenomic datasets produced by MetaSim [22] to evaluate its effectiveness

Read more

Summary

Introduction

Recovering individual genomes from metagenomic datasets allows access to uncultivated microbial populations that may have important roles in natural and engineered ecosystems. A key step in genome recovery from metagenomic sequence data is the classification of sequences assembled from metagenomic reads into discrete units, referred to as bins. These bins represent composite genomes of individual populations that comprise the microbial community. A number of approaches have been developed to bin assembled sequences from metagenomic data [2,4,5,6,7,8,9] Among these techniques, one of the most widely used is emergent self-organizing maps (ESOMs), which have been used to bin assembled sequences by tetranucleotide frequencies [2] and read coverage levels (time series binning) [9]. A related approach to time series ESOM binning is differential coverage binning, which uses plots of differential read coverages of assembled sequences to distinguish individual genomic bins. Individual bins are tested for completeness (is it a complete genome?) and distinctiveness (does the bin only contain one genome?) using single-copy marker genes

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call