Abstract

Deep sequencing of PCR amplicon libraries facilitates the detection of low-abundance populations in environmental DNA surveys of complex microbial communities. At the same time, deep sequencing can lead to overestimates of microbial diversity through the generation of low-frequency, error-prone reads. Even with sequencing error rates below 0.005 per nucleotide position, the common method of generating operational taxonomic units (OTUs) by multiple sequence alignment and complete-linkage clustering significantly increases the number of predicted OTUs and inflates richness estimates. We show that a 2% single-linkage preclustering methodology followed by an average-linkage clustering based on pairwise alignments more accurately predicts expected OTUs in both single and pooled template preparations of known taxonomic composition. This new clustering method can reduce the OTU richness in environmental samples by as much as 30–60% but does not reduce the fraction of OTUs in long-tailed rank abundance curves that defines the rare biosphere.

Highlights

  • Parallel pyrosequencing of ribosomal RNA coding regions that evolve rapidly allows the detection of very low abundance populations in complex microbial communities

  • Quality-controlled sequence reads of 16S ribosomal RNA (rRNA) V6 hypervariable region amplicon libraries produced by the Roche Genome Sequencer 20 System (GS 20) have a per-base error rate of 0.25% (Huse et al, 2007), which is comparable to a average phred score better than 25 on contemporary capillary instruments (Ewing and Green, 1998; Ewing et al, 1998)

  • We sequenced approximately 200 000 high-quality reads from each of the genomic template E. coli and S. epidermidis libraries and from the mixed template Clone-43 library, and approximately 30 000 high-quality reads from the single template E. coli and S. epidermidis plasmid clone libraries

Read more

Summary

Introduction

Parallel pyrosequencing of ribosomal RNA (rRNA) coding regions that evolve rapidly allows the detection of very low abundance populations in complex microbial communities. Quality-controlled sequence reads of 16S rRNA V6 hypervariable region amplicon libraries produced by the Roche Genome Sequencer 20 System (GS 20) have a per-base error rate of 0.25% (Huse et al, 2007), which is comparable to a average phred score better than 25 on contemporary capillary instruments (Ewing and Green, 1998; Ewing et al, 1998). For targets such as the short V6 rRNA hypervariable region that initially revealed the existence of the rare biosphere, 1 inaccurate nucleotide per 400 positions translates into 13% of the reads containing at least 1 inaccuracy. Even with very low sequencing error rates, the very large data sets produced by massively parallel sequencing will inevitably contain a fraction of reads with multiple errors, which can lead to overestimates of diversity

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.