Abstract

BackgroundSequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets.ResultsWe present here a scalable pre-assembly binning scheme (i.e., operating on unassembled short reads) enabling latent genome recovery by leveraging sparse dictionary learning and elastic-net regularization, and its use to recover hundreds of metagenome-assembled genomes, including very low-abundance genomes, from a joint analysis of microbiomes from the LifeLines DEEP population cohort (n = 1,135, >1010 reads).ConclusionWe showed that sparse coding techniques can be leveraged to carry out read-level binning at large scale and that, despite lower genome reconstruction yields compared to assembly-based approaches, bin-first strategies can complement the more widely used assembly-first protocols by targeting distinct genome segregation profiles. Read enrichment levels across 6 orders of magnitude in relative abundance were observed, indicating that the method has the power to recover genomes consistently segregating at low levels.

Highlights

  • Sequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets

  • Abundance covariance-based binning has the power to identify biologically meaningful associations between metagenomic sequences that could go unnoticed by analyses based on sequence overlap or nucleotide signatures

  • The large number of incomplete but otherwise uncontaminated partitions/bins in the LifeLines DEEP analysis partly reflects the widespread occurrence of this type of variation in natural habitats

Read more

Summary

Introduction

Sequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets. Several limitations, including sequencing errors, strain-level polymorphism, repeat elements, and inequal coverage, among others, concur to yield fragmented metagenome assemblies, which require post-processing in order to cluster (bin) assembled fragments into meaningful biological entities, ideally strain-resolved genomes. The vast majority of these MAGs have been produced by post-assembly binning approaches, i.e., operating on sequence contigs assembled on a sample-bysample basis. Highly successful, such methods are “inherently biased towards the most abundant or-.

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call