Abstract

BackgroundAdvances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized.MethodsWe present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms.ResultsThe library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost.ConclusionsHere, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.

Highlights

  • Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine

  • To address issues related to accuracy, cost, and throughput, we have designed an improved Illumina compatible library preparation protocol, combining features of several existing popular sequencing strategies, for 16S Ribosomal RNA (rRNA) gene amplicon sequencing

  • Bias caused by sample indexing both in PCR1 (Dataset 1, Additional file 4: Table S3) and PCR2 (Dataset 2, Additional file 5: Table S4) reactions, as well as the effects of DNA template amount used for Polymerase chain reaction (PCR) and PCR cycle number (Dataset 3, Additional file 6: Table S5), were evaluated by sequencing the mock community 160 times in a single MiSeq run

Read more

Summary

Introduction

Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Sequencing large numbers of relatively short DNA fragments has become routine, and microbiologists have adapted these technologies to characterize communities of microbes either by targeted sequencing of conserved regions containing phylogenetically informative polymorphisms (e.g., 16S or 18S rRNA gene sequencing) or by sequencing a sub-set of the randomly sheared DNA molecules in a sample (so-called shotgun metagenomics). Both approaches present unique challenges for identification and interpretation of biologically meaningful information, and for the moment, the high costs associated with deep sequencing in shotgun metagenomics currently limits full exploitation. Illumina’s HiSeq 2500 machine is capable of producing 300 million 250 bp paired reads per run using Rapid run mode v2 500 cycle reagents, at approximately one-third the cost per base compared to MiSeq

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.