Abstract

BackgroundHigh-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because nucleotide variation along the 16S rRNA gene is only partially captured by short reads. The recent emergence of long-read platforms, such as single-molecule real-time (SMRT) sequencing from Pacific Biosciences, offers the potential for improved taxonomic and phylogenetic profiling. Here, we evaluate the performance of long- and short-read 16S rRNA gene sequencing using simulated and experimental data, followed by OTU inference using computational pipelines based on heuristic and complete-linkage hierarchical clustering.ResultsIn simulated data, long-read sequencing was shown to improve OTU quality and decrease variance. We then profiled 40 human gut microbiome samples using a combination of Illumina MiSeq and Blautia-specific SMRT sequencing, further supporting the notion that long reads can identify additional OTUs. We implemented a complete-linkage hierarchical clustering strategy using a flexible computational pipeline, tailored specifically for PacBio circular consensus sequencing (CCS) data that outperforms heuristic methods in most settings: https://github.com/oscar-franzen/oclust/.ConclusionOur data demonstrate that long reads can improve OTU inference; however, the choice of clustering algorithm and associated clustering thresholds has significant impact on performance.Electronic supplementary materialThe online version of this article (doi:10.1186/s40168-015-0105-6) contains supplementary material, which is available to authorized users.

Highlights

  • High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling

  • The observed improvement in clustering outcome was dependent on the clustering method, where pairwise sequence comparisons followed by complete-linkage Hierarchical clustering (HC) outperformed DNACLUST (Additional file 3), cd-hit, usearch, and oclust multiple sequence alignments (MSA) at all read lengths

  • With Pacific Biosciences (PacBio) circular consensus sequencing (CCS) reads of length 450 bp, cd-hit leverages clustering which is much worse than the best program

Read more

Summary

Introduction

High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. We evaluate the performance of long- and short-read 16S rRNA gene sequencing using simulated and experimental data, followed by OTU inference using computational pipelines based on heuristic and complete-linkage hierarchical clustering. Body habitat-associated bacteria have received immense attention because of their relevance to human health and well-being [1, 2]. Until recently, these bacteria were largely studied with culture-dependent methods [3]. High-throughput DNA sequencing has bypassed the need to culture bacteria for assessing microbial diversity, enabling large-scale microbiome studies such as the Human Microbiome Project [4].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call