Abstract

We applied theoretical and simulation-based approaches to characterize how microbial community structure influences the amount of sequencing effort to reconstruct metagenomes that are assembled from short-read sequences. First, a coupon collector equation was proposed as an analytical model for predicting sequencing effort as a function of microbial community structure. Characterization was performed by varying community structure properties such as richness, evenness, and genome size. Simulations demonstrated that while community richness and evenness influenced the sequencing effort required to sequence a community metagenome to exhaustion, the effort necessary to sequence an individual genome to a target fraction of exhaustion depended only on the relative abundance of the genome and its genome size. A second analysis evaluated the quantity, completion, and contamination of metagenome-assembled genomes (MAGs) as a function of sequencing effort on four preexisting sequence read data sets from different environments. These data sets were subsampled to various degrees of completeness to simulate the effect of sequencing effort on MAG retrieval. Modeling suggested that sequencing efforts beyond what is typical in published experiments (1 to 10 Gbp) would generate diminishing returns in terms of MAG binning. A software tool, Genome Relative Abundance to Sequencing Effort (GRASE), was created to assist investigators to further explore this relationship. Reevaluation of the relationship between sequencing effort and binning success in the context of genome relative abundance, as opposed to base pairs, provides a constraint on sequencing experiments based on the relative abundance of microbes in an environment rather than arbitrary levels of sequencing effort.IMPORTANCE Short-read sequencing with Illumina sequencing technology provides an accurate, high-throughput method for characterizing the metabolic potential of microbial communities. Short-read sequences can be assembled and binned into metagenome-assembled genomes, thus shedding light on the function of microbial ecosystems that are important for health, agriculture, and Earth system processes. The work presented here provides an analytical framework for selecting sequencing effort as a function of genome relative abundance. As such, experimental goals in metagenome-assembled genome creation projects can select sequencing effort based on the rarest target genome as a constrained threshold. We hope that the results presented here, as well as GRASE, will be valuable to researchers planning sequencing experiments.

Highlights

  • We applied theoretical and simulation-based approaches to characterize how microbial community structure influences the amount of sequencing effort to reconstruct metagenomes that are assembled from short-read sequences

  • The recent creation of ϳ8,000 metagenome-assembled genomes (MAGs) from largely uncultured organisms across the tree of life [2], the spatial characterization of microbial metabolisms and ecology across Earth’s oceans [3], and the characterization of the potential impact that fermentation-based microbial metabolisms have on biogeochemical cycling in subsurface sediment environments [4] provide a few examples of how MAGs have helped to elucidate the relationships between microbial ecology, microbial metabolisms, and biogeochemistry

  • Using equation 6, we calculated the number of sequence reads required to sequence four hypothetical microbial communities to exhaustion: a perfectly even, moderately uneven, highly uneven, and a lognormally distributed structured community (Fig. 2a to d)

Read more

Summary

Introduction

We applied theoretical and simulation-based approaches to characterize how microbial community structure influences the amount of sequencing effort to reconstruct metagenomes that are assembled from short-read sequences. A second analysis evaluated the quantity, completion, and contamination of metagenome-assembled genomes (MAGs) as a function of sequencing effort on four preexisting sequence read data sets from different environments. The reconstruction of high-accuracy short-read sequences into metagenome-assembled genomes (MAGs) is a powerful approach to characterize microbial metabolisms within complex communities [1]. There is little information to guide how much sequencing is appropriate to achieve scientific goals in such experiments [5] This gap in knowledge is partly attributed to the unknown structure of target microbial communities. Without a priori knowledge of the microbial community structure, practical application of these methods to estimate MAG retrieval as a function of sequencing effort is hindered

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call