Abstract

BackgroundIn environmental sequencing projects, a mix of DNA from a whole microbial community is fragmented and sequenced, with one of the possible goals being to reconstruct partial or complete genomes of members of the community. In communities with high diversity of species, a significant proportion of the sequences do not overlap any other fragment in the sample. This problem will arise not only in situations with a relatively even distribution of many species, but also when the community in a particular environment is routinely dominated by the same few species. In the former case, no genomes may be assembled at all, while in the latter case a few dominant species in an environment will always be sequenced at high coverage to the detriment of coverage of the greater number of sparse species.Methods and ResultsHere we show that, with the same global sequencing effort, separating the species into two or more sub-communities prior to sequencing can yield a much higher proportion of sequences that can be assembled. We first use the Lander-Waterman model to show that, if the expected percentage of singleton sequences is higher than 25%, then, under the uniform distribution hypothesis, splitting the community is always a wise choice. We then construct simulated microbial communities to show that the results hold for highly non-uniform distributions. We also show that, for the distributions considered in the experiments, it is possible to estimate quite accurately the relative diversity of the two sub-communities.ConclusionGiven the fact that several methods exist to split microbial communities based on physical properties such as size, density, surface biochemistry, or optical properties, we strongly suggest that groups involved in environmental sequencing, and expecting high diversity, consider splitting their communities in order to maximize the information content of their sequencing effort.

Highlights

  • Whole genome shotgun sequencing is a standard approach for quickly achieving a high degree of genome coverage for individual organisms

  • Random sequencing of clones from a metagenomic library often results in a low proportion of overlapping fragments

  • Much of the power of genomics derives from understanding genes in their genomic context, and the failure to assemble individual sequence reads into longer stretches represents a significant loss of genomic information that was originally present in the sample

Read more

Summary

Introduction

Whole genome shotgun sequencing is a standard approach for quickly achieving a high degree of genome coverage for individual organisms. This procedure is being applied to environmental sequencing projects in an approach commonly referred to as metagenomics or microbial community genomics [1] For this application, a community shotgun library is prepared from DNA that has been extracted from a natural assemblage of microorganisms, rather than from an individual isolate. In communities with high diversity of species, a significant proportion of the sequences do not overlap any other fragment in the sample This problem will arise in situations with a relatively even distribution of many species, and when the community in a particular environment is routinely dominated by the same few species. Given the fact that several methods exist to split microbial communities based on physical properties such as size, density, surface biochemistry, or optical properties, we strongly suggest that groups involved in environmental sequencing, and expecting high diversity, consider splitting their communities in order to maximize the information content of their sequencing effort

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.