Abstract

Shotgun metagenomics of microbial communities reveal information about strains of relevance for applications in medicine, biotechnology and ecology. Recovering their genomes is a crucial but very challenging step due to the complexity of the underlying biological system and technical factors. Microbial communities are heterogeneous, with oftentimes hundreds of present genomes deriving from different species or strains, all at varying abundances and with different degrees of similarity to each other and reference data. We present a versatile probabilistic model for genome recovery and analysis, which aggregates three types of information that are commonly used for genome recovery from metagenomes. As potential applications we showcase metagenome contig classification, genome sample enrichment and genome bin comparisons. The open source implementation MGLEX is available via the Python Package Index and on GitHub and can be embedded into metagenome analysis workflows and programs.

Highlights

  • Shotgun sequencing of DNA extracted from a microbial community recovers genomic data from different community members while bypassing the need to obtain pure isolate cultures

  • Assembled sequences, called contigs, that originate from the same genome are placed together in this process, which is known as metagenome binning (Droge & McHardy, 2012) and for which many programs have been developed

  • Maximum likelihood classification We evaluated the performance of the model when classifying contigs to the genome with the highest likelihood, a procedure called Maximum Likelihood (ML) classification

Read more

Summary

Introduction

Shotgun sequencing of DNA extracted from a microbial community recovers genomic data from different community members while bypassing the need to obtain pure isolate cultures. It enables novel insights into ecosystems, especially for those genomes which are inaccessible by cultivation techniques and isolate sequencing. Assembled sequences, called contigs, that originate from the same genome are placed together in this process, which is known as metagenome binning (Droge & McHardy, 2012) and for which many programs have been developed. Some are trained on reference sequences, using contig k-mer frequencies or sequence similarities as sources of information (McHardy et al, 2007; Droge, Gregor & McHardy, 2014; Wood & Salzberg, 2014; Gregor et al, 2016), which can be adapted to specific ecosystems. Others cluster the contigs into genome bins, using contig k-mer frequencies and read coverage (Chatterji et al, 2008; Kislyuk et al, 2009; Wu et al, 2014; Nielsen et al, 2014; Imelfort et al, 2014; Alneberg et al, 2014; Kang et al, 2015; Lu et al, 2016)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call