Abstract
BackgroundCurrently, bacterial 16S rRNA gene analyses are based on sequencing of individual variable regions of the 16S rRNA gene (Kozich, et al Appl Environ Microbiol 79:5112–5120, 2013).This short read approach can introduce biases. Thus, full-length bacterial 16S rRNA gene sequencing is needed to reduced biases. A new alternative for full-length bacterial 16S rRNA gene sequencing is offered by PacBio single molecule, real-time (SMRT) technology. The aim of our study was to validate PacBio P6 sequencing chemistry using three approaches: 1) sequencing the full-length bacterial 16S rRNA gene from a single bacterial species Staphylococcus aureus to analyze error modes and to optimize the bioinformatics pipeline; 2) sequencing the full-length bacterial 16S rRNA gene from a pool of 50 different bacterial colonies from human stool samples to compare with full-length bacterial 16S rRNA capillary sequence; and 3) sequencing the full-length bacterial 16S rRNA genes from 11 vaginal microbiome samples and compare with in silico selected bacterial 16S rRNA V1V2 gene region and with bacterial 16S rRNA V1V2 gene regions sequenced using the Illumina MiSeq.ResultsOur optimized bioinformatics pipeline for PacBio sequence analysis was able to achieve an error rate of 0.007% on the Staphylococcus aureus full-length 16S rRNA gene. Capillary sequencing of the full-length bacterial 16S rRNA gene from the pool of 50 colonies from stool identified 40 bacterial species of which up to 80% could be identified by PacBio full-length bacterial 16S rRNA gene sequencing. Analysis of the human vaginal microbiome using the bacterial 16S rRNA V1V2 gene region on MiSeq generated 129 operational taxonomic units (OTUs) from which 70 species could be identified. For the PacBio, 36,000 sequences from over 58,000 raw reads could be assigned to a barcode, and the in silico selected bacterial 16S rRNA V1V2 gene region generated 154 OTUs grouped into 63 species, of which 62% were shared with the MiSeq dataset. The PacBio full-length bacterial 16S rRNA gene datasets generated 261 OTUs, which were grouped into 52 species, of which 54% were shared with the MiSeq dataset. Alpha diversity index reported a higher diversity in the MiSeq dataset.ConclusionThe PacBio sequencing error rate is now in the same range of the previously widely used Roche 454 sequencing platform and current MiSeq platform. Species-level microbiome analysis revealed some inconsistencies between the full-length bacterial 16S rRNA gene capillary sequencing and PacBio sequencing.Electronic supplementary materialThe online version of this article (doi:10.1186/s12866-016-0891-4) contains supplementary material, which is available to authorized users.
Highlights
Bacterial 16S rRNA gene analyses are based on sequencing of individual variable regions of the 16S rRNA gene (Kozich, et al Appl Environ Microbiol 79:5112–5120, 2013).This short read approach can introduce biases
In this study we: 1) evaluate the accuracy for the latest Pacific biosciences (PacBio) chemistry P6 using deoxyribonucleic acid (DNA) from a single bacterial species (Staphylococcus aureus); 2) compared PacBio sequencing with capillary sequencing using a pool of 50 species isolated from human stool samples; and 3) compared PacBio sequencing and Illumina Miseq sequencing using human vaginal microbiome samples
This study aimed to validate the value of full-length bacterial 16S rRNA gene sequencing using PacBio RS II platform and to compare with capillary sequencing and shortread Illumina MiSeq sequencing for bacterial species classification
Summary
Bacterial 16S rRNA gene analyses are based on sequencing of individual variable regions of the 16S rRNA gene (Kozich, et al Appl Environ Microbiol 79:5112–5120, 2013).This short read approach can introduce biases. Full-length bacterial 16S rRNA gene sequencing is needed to reduced biases. The short read approach by second generation sequencing introduces biases depending on which variable regions are used and can not provide effective resolution below the bacterial genus level, limiting microbial ecology studies [5, 8]. A large proportion of the bacterial 16S rRNA gene records in the GenBank database labeled as environmental samples are unclassified, which is in part due to low read accuracy, potential chimeric sequences produced during PCR amplification and the low resolution of short amplicons. High throughput full-length bacterial 16S rRNA gene sequencing methodologies with reduced biases are needed
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have