Reproducible Emu-Based Workflow for High-Fidelity Soil and Plant Microbiome Profiling on HPC Clusters

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Accurate profiling of soil and root-associated bacterial communities is essential for understanding ecosystem functions and improving sustainable agricultural practices. Here, a comprehensive, modular workflow is presented for the analysis of full-length 16S rRNA gene amplicons generated with Oxford Nanopore long-read sequencing. The protocol integrates four standardized steps: (i) quality assessment and filtering of raw reads with NanoPlot and NanoFilt, (ii) removal of plant organelle contamination using a curated Viridiplantae Kraken2 database, (iii) species-level taxonomic assignment with Emu, and (iv) downstream ecological analyses, including rarefaction, diversity metrics, and functional inference. Leveraging high-performance computing resources, the workflow enables parallel processing of large datasets, rigorous contamination control, and reproducible execution across environments. The pipeline’s efficiency is demonstrated on full-length 16S rRNA gene datasets from yellow pea rhizosphere and root samples, with high post-filter read retention and high-resolution community profiles. Automated SLURM scripts and detailed documentation are provided in a public GitHub repository (https://github.com/henrimdias/emu-microbiome-HPC; release v1.0.2, emu-pipeline-revised) and archived on Zenodo (DOI: 10.5281/zenodo.17764933).Key features• Implement rigorous quality control (QC) of raw 16S rRNA Nanopore reads and sequencing controls.• Remove plant organelle contamination with a curated Kraken2 database.• Perform high-resolution taxonomic assignment of full-length 16S rRNA reads using Emu.• Integrate downstream statistical analyses, including rarefaction, PERMANOVA, and DESeq2 differential abundance.• Conduct scalable microbiome diversity and functional analyses with FAPROTAX.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 190
  • 10.1186/s12866-016-0891-4
Evaluation of PacBio sequencing for full-length bacterial 16S rRNA gene classification
  • Nov 14, 2016
  • BMC Microbiology
  • Josef Wagner + 5 more

BackgroundCurrently, bacterial 16S rRNA gene analyses are based on sequencing of individual variable regions of the 16S rRNA gene (Kozich, et al Appl Environ Microbiol 79:5112–5120, 2013).This short read approach can introduce biases. Thus, full-length bacterial 16S rRNA gene sequencing is needed to reduced biases. A new alternative for full-length bacterial 16S rRNA gene sequencing is offered by PacBio single molecule, real-time (SMRT) technology. The aim of our study was to validate PacBio P6 sequencing chemistry using three approaches: 1) sequencing the full-length bacterial 16S rRNA gene from a single bacterial species Staphylococcus aureus to analyze error modes and to optimize the bioinformatics pipeline; 2) sequencing the full-length bacterial 16S rRNA gene from a pool of 50 different bacterial colonies from human stool samples to compare with full-length bacterial 16S rRNA capillary sequence; and 3) sequencing the full-length bacterial 16S rRNA genes from 11 vaginal microbiome samples and compare with in silico selected bacterial 16S rRNA V1V2 gene region and with bacterial 16S rRNA V1V2 gene regions sequenced using the Illumina MiSeq.ResultsOur optimized bioinformatics pipeline for PacBio sequence analysis was able to achieve an error rate of 0.007% on the Staphylococcus aureus full-length 16S rRNA gene. Capillary sequencing of the full-length bacterial 16S rRNA gene from the pool of 50 colonies from stool identified 40 bacterial species of which up to 80% could be identified by PacBio full-length bacterial 16S rRNA gene sequencing. Analysis of the human vaginal microbiome using the bacterial 16S rRNA V1V2 gene region on MiSeq generated 129 operational taxonomic units (OTUs) from which 70 species could be identified. For the PacBio, 36,000 sequences from over 58,000 raw reads could be assigned to a barcode, and the in silico selected bacterial 16S rRNA V1V2 gene region generated 154 OTUs grouped into 63 species, of which 62% were shared with the MiSeq dataset. The PacBio full-length bacterial 16S rRNA gene datasets generated 261 OTUs, which were grouped into 52 species, of which 54% were shared with the MiSeq dataset. Alpha diversity index reported a higher diversity in the MiSeq dataset.ConclusionThe PacBio sequencing error rate is now in the same range of the previously widely used Roche 454 sequencing platform and current MiSeq platform. Species-level microbiome analysis revealed some inconsistencies between the full-length bacterial 16S rRNA gene capillary sequencing and PacBio sequencing.Electronic supplementary materialThe online version of this article (doi:10.1186/s12866-016-0891-4) contains supplementary material, which is available to authorized users.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 260
  • 10.1186/s12866-021-02094-5
Full-length 16S rRNA gene amplicon analysis of human gut microbiota using MinION\u2122 nanopore sequencing confers species-level resolution
  • Jan 26, 2021
  • BMC Microbiology
  • Yoshiyuki Matsuo + 13 more

BackgroundSpecies-level genetic characterization of complex bacterial communities has important clinical applications in both diagnosis and treatment. Amplicon sequencing of the 16S ribosomal RNA (rRNA) gene has proven to be a powerful strategy for the taxonomic classification of bacteria. This study aims to improve the method for full-length 16S rRNA gene analysis using the nanopore long-read sequencer MinION™. We compared it to the conventional short-read sequencing method in both a mock bacterial community and human fecal samples.ResultsWe modified our existing protocol for full-length 16S rRNA gene amplicon sequencing by MinION™. A new strategy for library construction with an optimized primer set overcame PCR-associated bias and enabled taxonomic classification across a broad range of bacterial species. We compared the performance of full-length and short-read 16S rRNA gene amplicon sequencing for the characterization of human gut microbiota with a complex bacterial composition. The relative abundance of dominant bacterial genera was highly similar between full-length and short-read sequencing. At the species level, MinION™ long-read sequencing had better resolution for discriminating between members of particular taxa such as Bifidobacterium, allowing an accurate representation of the sample bacterial composition.ConclusionsOur present microbiome study, comparing the discriminatory power of full-length and short-read sequencing, clearly illustrated the analytical advantage of sequencing the full-length 16S rRNA gene.

  • Research Article
  • Cite Count Icon 9
  • 10.1007/s00284-015-0898-3
New Primers Targeting Full-Length Ciliate 18S rRNA Genes and Evaluation of Dietary Effect on Rumen Ciliate Diversity in Dairy Cows.
  • Aug 30, 2015
  • Current Microbiology
  • Jun Zhang + 5 more

Analysis of the full-length 18S rRNA gene sequences of rumen ciliates is more reliable for taxonomical classification and diversity assessment than the analysis of partial hypervariable regions only. The objective of this study was to develop new oligonucleotide primers targeting the full-length 18S rRNA genes of rumen ciliates, and to evaluate the effect of different sources of dietary fiber (corn stover or a mixture of alfalfa hay and corn silage) and protein (mixed rapeseed, cottonseed, and/or soybean meals) on rumen ciliate diversity in dairy cows. Primers were designed based on a total of 137 previously reported ciliate 18S rRNA gene sequences. The 3'-terminal sequences of the newly designed primers, P.1747r_2, P.324f, and P.1651r, demonstrated >99% base coverage. Primer pair D (P.324f and P.1747r_2) was selected for the cloning and sequencing of ciliate 18S rRNA genes because it produced a 1423-bp amplicon, and did not amply the sequences of other eukaryotic species, such as yeast. The optimal species-level cutoff value for distinguishing between the operational taxonomic units of different ciliate species was 0.015. The phylogenetic analysis of full-length ciliate 18S rRNA gene sequences showed that distinct ciliate profiles were induced by the different sources of dietary fiber and protein. Dasytricha and Entodinium were the predominant genera in the ruminal fluid of dairy cattle, and Dasytricha was significantly more abundant in cows fed with corn stover than in cows fed with alfalfa hay and corn silage.

  • Research Article
  • Cite Count Icon 7
  • 10.1186/s13213-024-01767-6
Full-length 16S rRNA gene sequencing combined with adequate database selection improves the description of Arctic marine prokaryotic communities
  • Aug 10, 2024
  • Annals of Microbiology
  • Francisco Pascoal + 4 more

BackgroundHigh-throughput sequencing of the full-length 16S rRNA gene has improved the taxonomic classification of prokaryotes found in natural environments. However, sequencing of shorter regions from the same gene, like the V4-V5 region, can provide more cost-effective high throughput. It is unclear which approach best describes prokaryotic communities from underexplored environments. In this study, we hypothesize that high-throughput full-length 16S rRNA gene sequencing combined with adequate taxonomic databases improves the taxonomic description of prokaryotic communities from underexplored environments in comparison with high-throughput sequencing of a short region of the 16S rRNA gene.ResultsTo test our hypothesis, we compared taxonomic profiles of seawater samples from the Arctic Ocean using: full-length and V4-V5 16S rRNA gene sequencing in combination with either the Genome Taxonomy Database (GTDB) or the Silva taxonomy database. Our results show that all combinations of sequencing strategies and taxonomic databases present similar results at higher taxonomic levels. However, at lower taxonomic levels, namely family, genus, and most notably species level, the full-length approach led to higher proportions of Amplicon Sequence Variants (ASVs) assigned to formally valid taxa. Hence, the best taxonomic description was obtained by the full-length and GTDB combination, which in some cases allowed for the identification of intraspecific diversity of ASVs.ConclusionsWe conclude that coupling high-throughput full-length 16S rRNA gene sequencing with GTDB improves the description of microbiome profiling at lower taxonomic ranks. The improvements reported here provide more context for scientists to discuss microbial community dynamics within a solid taxonomic framework in environments like the Arctic Ocean with still underrepresented microbiome sequences in public databases.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 54
  • 10.1186/s40168-018-0463-y
Near full-length 16S rRNA gene next-generation sequencing revealed Asaia as a common midgut bacterium of wild and domesticated Queensland fruit fly larvae
  • May 5, 2018
  • Microbiome
  • Ania T Deutscher + 5 more

BackgroundGut microbiota affects tephritid (Diptera: Tephritidae) fruit fly development, physiology, behavior, and thus the quality of flies mass-reared for the sterile insect technique (SIT), a target-specific, sustainable, environmentally benign form of pest management. The Queensland fruit fly, Bactrocera tryoni (Tephritidae), is a significant horticultural pest in Australia and can be managed with SIT. Little is known about the impacts that laboratory-adaptation (domestication) and mass-rearing have on the tephritid larval gut microbiome. Read lengths of previous fruit fly next-generation sequencing (NGS) studies have limited the resolution of microbiome studies, and the diversity within populations is often overlooked. In this study, we used a new near full-length (> 1300 nt) 16S rRNA gene amplicon NGS approach to characterize gut bacterial communities of individual B. tryoni larvae from two field populations (developing in peaches) and three domesticated populations (mass- or laboratory-reared on artificial diets).ResultsNear full-length 16S rRNA gene sequences were obtained for 56 B. tryoni larvae. OTU clustering at 99% similarity revealed that gut bacterial diversity was low and significantly lower in domesticated larvae. Bacteria commonly associated with fruit (Acetobacteraceae, Enterobacteriaceae, and Leuconostocaceae) were detected in wild larvae, but were largely absent from domesticated larvae. However, Asaia, an acetic acid bacterium not frequently detected within adult tephritid species, was detected in larvae of both wild and domesticated populations (55 out of 56 larval gut samples). Larvae from the same single peach shared a similar gut bacterial profile, whereas larvae from different peaches collected from the same tree had different gut bacterial profiles. Clustering of the Asaia near full-length sequences at 100% similarity showed that the wild flies from different locations had different Asaia strains.ConclusionsVariation in the gut bacterial communities of B. tryoni larvae depends on diet, domestication, and horizontal acquisition. Bacterial variation in wild larvae suggests that more than one bacterial species can perform the same functional role; however, Asaia could be an important gut bacterium in larvae and warrants further study. A greater understanding of the functions of the bacteria detected in larvae could lead to increased fly quality and performance as part of the SIT.

  • Research Article
  • Cite Count Icon 349
  • 10.1128/aem.01282-13
Intragenomic Heterogeneity of 16S rRNA Genes Causes Overestimation of Prokaryotic Diversity
  • Jul 19, 2013
  • Applied and Environmental Microbiology
  • Dong-Lei Sun + 3 more

Ever since Carl Woese introduced the use of 16S rRNA genes for determining the phylogenetic relationships of prokaryotes, this method has been regarded as the "gold standard" in both microbial phylogeny and ecology studies. However, intragenomic heterogeneity within 16S rRNA genes has been reported in many investigations and is believed to bias the estimation of prokaryotic diversity. In the current study, 2,013 completely sequenced genomes of bacteria and archaea were analyzed and intragenomic heterogeneity was found in 952 genomes (585 species), with 87.5% of the divergence detected being below the 1% level. In particular, some extremophiles (thermophiles and halophiles) were found to harbor highly divergent 16S rRNA genes. Overestimation caused by 16S rRNA gene intragenomic heterogeneity was evaluated at different levels using the full-length and partial 16S rRNA genes usually chosen as targets for pyrosequencing. The result indicates that, at the unique level, full-length 16S rRNA genes can produce an overestimation of as much as 123.7%, while at the 3% level, an overestimation of 12.9% for the V6 region may be introduced. Further analysis showed that intragenomic heterogeneity tends to concentrate in specific positions, with the V1 and V6 regions suffering the most intragenomic heterogeneity and the V4 and V5 regions suffering the least intragenomic heterogeneity in bacteria. This is the most up-to-date overview of the diversity of 16S rRNA genes within prokaryotic genomes. It not only provides general guidance on how much overestimation can be introduced when applying 16S rRNA gene-based methods, due to its intragenomic heterogeneity, but also recommends that, for bacteria, this overestimation be minimized using primers targeting the V4 and V5 regions.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 87
  • 10.12688/f1000research.16817.2
Microbiota profiling with long amplicons using Nanopore sequencing: full-length 16S rRNA gene and the 16S-ITS-23S of the rrn operon.
  • Aug 1, 2019
  • F1000Research
  • Anna Cuscó + 4 more

Background: Profiling the microbiome of low-biomass samples is challenging for metagenomics since these samples are prone to contain DNA from other sources (e.g. host or environment). The usual approach is sequencing short regions of the 16S rRNA gene, which fails to assign taxonomy to genus and species level. To achieve an increased taxonomic resolution, we aim to develop long-amplicon PCR-based approaches using Nanopore sequencing. We assessed two different genetic markers: the full-length 16S rRNA (~1,500 bp) and the 16S-ITS-23S region from the rrn operon (4,300 bp). Methods: We sequenced a clinical isolate of Staphylococcus pseudintermedius, two mock communities and two pools of low-biomass samples (dog skin). Nanopore sequencing was performed on MinION™ using the 1D PCR barcoding kit. Sequences were pre-processed, and data were analyzed using EPI2ME or Minimap2 with rrn database. Consensus sequences of the 16S-ITS-23S genetic marker were obtained using canu. Results: The full-length 16S rRNA and the 16S-ITS-23S region of the rrn operon were used to retrieve the microbiota composition of the samples at the genus and species level. For the Staphylococcus pseudintermedius isolate, the amplicons were assigned to the correct bacterial species in ~98% of the cases with the16S-ITS-23S genetic marker, and in ~68%, with the 16S rRNA gene when using EPI2ME. Using mock communities, we found that the full-length 16S rRNA gene represented better the abundances of a microbial community; whereas, 16S-ITS-23S obtained better resolution at the species level. Finally, we characterized low-biomass skin microbiota samples and detected species with an environmental origin. Conclusions: Both full-length 16S rRNA and the 16S-ITS-23S of the rrn operon retrieved the microbiota composition of simple and complex microbial communities, even from the low-biomass samples such as dog skin. For an increased resolution at the species level, targeting the 16S-ITS-23S of the rrn operon would be the best choice.

  • Research Article
  • Cite Count Icon 32
  • 10.5256/f1000research.18384.r40373
Microbiota profiling with long amplicons using Nanopore sequencing: full-length 16S rRNA gene and the 16S-ITS-23S of the rrn operon
  • Nov 19, 2018
  • F1000Research
  • Alfonso Benítez-Páez

Background: Profiling the microbiome of low-biomass samples is challenging for metagenomics since these samples are prone to contain DNA from other sources (e.g. host or environment). The usual approach is sequencing short regions of the 16S rRNA gene, which fails to assign taxonomy to genus and species level. To achieve an increased taxonomic resolution, we aim to develop long-amplicon PCR-based approaches using Nanopore sequencing. We assessed two different genetic markers: the full-length 16S rRNA (~1,500 bp) and the 16S-ITS-23S region from therrn operon (4,300 bp).Methods: We sequenced a clinical isolate ofStaphylococcus pseudintermedius, two mock communities and two pools of low-biomass samples (dog skin). Nanopore sequencing was performed on MinION™ using the 1D PCR barcoding kit. Sequences were pre-processed, and data were analyzed using EPI2ME or Minimap2 withrrn database. Consensus sequences of the 16S-ITS-23S genetic marker were obtained using canu.Results: The full-length 16S rRNA and the 16S-ITS-23S region of therrn operon were used to retrieve the microbiota composition of the samples at the genus and species level. For theStaphylococcus pseudintermedius isolate, the amplicons were assigned to the correct bacterial species in ~98% of the cases with the16S-ITS-23S genetic marker, and in ~68%, with the 16S rRNA gene when using EPI2ME. Using mock communities, we found that the full-length 16S rRNA gene represented better the abundances of a microbial community; whereas, 16S-ITS-23S obtained better resolution at the species level. Finally, we characterized low-biomass skin microbiota samples and detected species with an environmental origin.Conclusions: Both full-length 16S rRNA and the 16S-ITS-23S of therrn operon retrieved the microbiota composition of simple and complex microbial communities, even from the low-biomass samples such as dog skin. For an increased resolution at the species level, targeting the 16S-ITS-23S of therrn operon would be the best choice.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 49
  • 10.3389/fcimb.2021.678522
Impact of Bead-Beating Intensity on the Genus- and Species-Level Characterization of the Gut Microbiome Using Amplicon and Complete 16S rRNA Gene Sequencing.
  • Oct 1, 2021
  • Frontiers in Cellular and Infection Microbiology
  • Bo Zhang + 6 more

Bead-beating within a DNA extraction protocol is critical for complete microbial cell lysis and accurate assessment of the abundance and composition of the microbiome. While the impact of bead-beating on the recovery of OTUs at the phylum and class level have been studied, its influence on species-level microbiome recovery is not clear. Recent advances in sequencing technology has allowed species-level resolution of the microbiome using full length 16S rRNA gene sequencing instead of smaller amplicons that only capture a few hypervariable regions of the gene. We sequenced the v3-v4 hypervariable region as well as the full length 16S rRNA gene in mouse and human stool samples and discovered major clusters of gut bacteria that exhibit different levels of sensitivity to bead-beating treatment. Full length 16S rRNA gene sequencing unraveled vast species diversity in the mouse and human gut microbiome and enabled characterization of several unclassified OTUs in amplicon data. Many species of major gut commensals such as Bacteroides, Lactobacillus, Blautia, Clostridium, Escherichia, Roseburia, Helicobacter, and Ruminococcus were identified. Interestingly, v3-v4 amplicon data classified about 50% of Ruminococcus reads as Ruminococcus gnavus species which showed maximum abundance in a 9 min beaten sample. However, the remaining 50% of reads could not be assigned to any species. Full length 16S rRNA gene sequencing data showed that the majority of the unclassified reads were Ruminococcus albus species which unlike R. gnavus showed maximum recovery in the unbeaten sample instead. Furthermore, we found that the Blautia hominis and Streptococcus parasanguinis species were differently sensitive to bead-beating treatment than the rest of the species in these genera. Thus, the present study demonstrates species level variations in sensitivity to bead-beating treatment that could only be resolved with full length 16S rRNA sequencing. This study identifies species of common gut commensals and potential pathogens that require minimum (0-1 min) or extensive (4-9 min) bead-beating for their maximal recovery.

  • Research Article
  • Cite Count Icon 249
  • 10.1038/s41592-022-01520-4
Emu: species-level microbial community profiling of full-length 16S rRNA Oxford Nanopore sequencing data.
  • Jun 30, 2022
  • Nature Methods
  • Kristen D Curry + 13 more

16S ribosomal RNA-based analysis is the established standard for elucidating the composition of microbial communities. While short-read 16S rRNA analyses are largely confined to genus-level resolution at best, given that only a portion of the gene is sequenced, full-length 16S rRNA gene amplicon sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate often observed in long-read data. Here we present Emu, an approach that uses an expectation-maximization algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from simulated datasets and mock communities show that Emu is capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of Emu by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow with those returned by full-length 16S rRNA gene sequences processed with Emu.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 71
  • 10.12688/f1000research.16817.1
Microbiota profiling with long amplicons using Nanopore sequencing: full-length 16S rRNA gene and whole rrn operon.
  • Nov 6, 2018
  • F1000Research
  • Anna Cuscó + 4 more

Background: Profiling the microbiome of low-biomass samples is challenging for metagenomics since these samples often contain DNA from other sources, such as the host or the environment. The usual approach is sequencing specific hypervariable regions of the 16S rRNA gene, which fails to assign taxonomy to genus and species level. Here, we aim to assess long-amplicon PCR-based approaches for assigning taxonomy at the genus and species level. We use Nanopore sequencing with two different markers: full-length 16S rRNA (~1,500 bp) and the whole rrn operon (16S rRNA-ITS-23S rRNA; 4,500 bp). Methods: We sequenced a clinical isolate of Staphylococcus pseudintermedius, two mock communities (HM-783D, Bei Resources; D6306, ZymoBIOMICS™) and two pools of low-biomass samples (dog skin from either the chin or dorsal back), using the MinION™ sequencer 1D PCR barcoding kit. Sequences were pre-processed, and data were analyzed using the WIMP workflow on EPI2ME or Minimap2 software with rrn database. Results: The full-length 16S rRNA and the rrn operon were used to retrieve the microbiota composition at the genus and species level from the bacterial isolate, mock communities and complex skin samples. For the Staphylococcus pseudintermedius isolate, when using EPI2ME, the amplicons were assigned to the correct bacterial species in ~98% of the cases with the rrn operon marker, and in ~68% of the cases with the 16S rRNA gene. In both skin microbiota samples, we detected many species with an environmental origin. In chin, we found different Pseudomonas species in high abundance, whereas in dorsal skin there were more taxa with lower abundances. Conclusions: Both full-length 16S rRNA and the rrn operon retrieved the microbiota composition of simple and complex microbial communities, even from the low-biomass samples such as dog skin. For an increased resolution at the species level, using the rrn operon would be the best choice.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 149
  • 10.7717/peerj.494
RIM-DB: a taxonomic framework for community structure analysis of methanogenic archaea from the rumen and other intestinal environments.
  • Aug 5, 2014
  • PeerJ
  • Henning Seedorf + 3 more

Methane is formed by methanogenic archaea in the rumen as one of the end products of feed fermentation in the ruminant digestive tract. To develop strategies to mitigate anthropogenic methane emissions due to ruminant farming, and to understand rumen microbial differences in animal feed conversion efficiency, it is essential that methanogens can be identified and taxonomically classified with high accuracy. Currently available taxonomic frameworks offer only limited resolution beyond the genus level for taxonomic assignments of sequence data stemming from high throughput sequencing technologies. Therefore, we have developed a QIIME-compatible database (DB) designed for species-level taxonomic assignment of 16S rRNA gene amplicon data targeting methanogenic archaea from the rumen, and from animal and human intestinal tracts. Called RIM-DB (Rumen and Intestinal Methanogen-DB), it contains a set of 2,379 almost full-length chimera-checked 16S rRNA gene sequences, including 20 previously unpublished sequences from isolates from three different orders. The taxonomy encompasses the recently-proposed seventh order of methanogens, the Methanomassiliicoccales, and allows differentiation between defined groups within this order. Sequence reads from rumen contents from a range of ruminant-diet combinations were taxonomically assigned using RIM-DB, Greengenes and SILVA. This comparison clearly showed that taxonomic assignments with RIM-DB resulted in the most detailed assignment, and only RIM-DB taxonomic assignments allowed methanogens to be distinguished taxonomically at the species level. RIM-DB complements the use of comprehensive databases such as Greengenes and SILVA for community structure analysis of methanogens from the rumen and other intestinal environments, and allows identification of target species for methane mitigation strategies.

  • Research Article
  • Cite Count Icon 23
  • 10.1016/j.ygeno.2021.06.001
Taxonomic profiling of Symbiodiniaceae and bacterial communities associated with Indo-Pacific corals in the Gulf of Thailand using PacBio sequencing of full-length ITS and 16S rRNA genes
  • Jun 3, 2021
  • Genomics
  • Wirulda Pootakham + 12 more

Taxonomic profiling of Symbiodiniaceae and bacterial communities associated with Indo-Pacific corals in the Gulf of Thailand using PacBio sequencing of full-length ITS and 16S rRNA genes

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 149
  • 10.1186/s40168-018-0569-2
Species-level bacterial community profiling of the healthy sinonasal microbiome using Pacific Biosciences sequencing of full-length 16S rRNA genes
  • Oct 23, 2018
  • Microbiome
  • Joshua P Earl + 13 more

BackgroundPan-bacterial 16S rRNA microbiome surveys performed with massively parallel DNA sequencing technologies have transformed community microbiological studies. Current 16S profiling methods, however, fail to provide sufficient taxonomic resolution and accuracy to adequately perform species-level associative studies for specific conditions. This is due to the amplification and sequencing of only short 16S rRNA gene regions, typically providing for only family- or genus-level taxonomy. Moreover, sequencing errors often inflate the number of taxa present. Pacific Biosciences’ (PacBio’s) long-read technology in particular suffers from high error rates per base. Herein, we present a microbiome analysis pipeline that takes advantage of PacBio circular consensus sequencing (CCS) technology to sequence and error correct full-length bacterial 16S rRNA genes, which provides high-fidelity species-level microbiome data.ResultsAnalysis of a mock community with 20 bacterial species demonstrated 100% specificity and sensitivity with regard to taxonomic classification. Examination of a 250-plus species mock community demonstrated correct species-level classification of > 90% of taxa, and relative abundances were accurately captured. The majority of the remaining taxa were demonstrated to be multiply, incorrectly, or incompletely classified. Using this methodology, we examined the microgeographic variation present among the microbiomes of six sinonasal sites, by both swab and biopsy, from the anterior nasal cavity to the sphenoid sinus from 12 subjects undergoing trans-sphenoidal hypophysectomy. We found greater variation among subjects than among sites within a subject, although significant within-individual differences were also observed. Propiniobacterium acnes (recently renamed Cutibacterium acnes) was the predominant species throughout, but was found at distinct relative abundances by site.ConclusionsOur microbial composition analysis pipeline for single-molecule real-time 16S rRNA gene sequencing (MCSMRT, https://github.com/jpearl01/mcsmrt) overcomes deficits of standard marker gene-based microbiome analyses by using CCS of entire 16S rRNA genes to provide increased taxonomic and phylogenetic resolution. Extensions of this approach to other marker genes could help refine taxonomic assignments of microbial species and improve reference databases, as well as strengthen the specificity of associations between microbial communities and dysbiotic states.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 83
  • 10.1186/s40168-020-00841-w
Construction of habitat-specific training sets to achieve species-level assignment in 16S rRNA gene datasets
  • May 15, 2020
  • Microbiome
  • Isabel F Escapa + 6 more

BackgroundThe low cost of 16S rRNA gene sequencing facilitates population-scale molecular epidemiological studies. Existing computational algorithms can resolve 16S rRNA gene sequences into high-resolution amplicon sequence variants (ASVs), which represent consistent labels comparable across studies. Assigning these ASVs to species-level taxonomy strengthens the ecological and/or clinical relevance of 16S rRNA gene-based microbiota studies and further facilitates data comparison across studies.ResultsTo achieve this, we developed a broadly applicable method for constructing high-resolution training sets based on the phylogenic relationships among microbes found in a habitat of interest. When used with the naïve Bayesian Ribosomal Database Project (RDP) Classifier, this training set achieved species/supraspecies-level taxonomic assignment of 16S rRNA gene-derived ASVs. The key steps for generating such a training set are (1) constructing an accurate and comprehensive phylogenetic-based, habitat-specific database; (2) compiling multiple 16S rRNA gene sequences to represent the natural sequence variability of each taxon in the database; (3) trimming the training set to match the sequenced regions, if necessary; and (4) placing species sharing closely related sequences into a training-set-specific supraspecies taxonomic level to preserve subgenus-level resolution. As proof of principle, we developed a V1–V3 region training set for the bacterial microbiota of the human aerodigestive tract using the full-length 16S rRNA gene reference sequences compiled in our expanded Human Oral Microbiome Database (eHOMD). We also overcame technical limitations to successfully use Illumina sequences for the 16S rRNA gene V1–V3 region, the most informative segment for classifying bacteria native to the human aerodigestive tract. Finally, we generated a full-length eHOMD 16S rRNA gene training set, which we used in conjunction with an independent PacBio single molecule, real-time (SMRT)-sequenced sinonasal dataset to validate the representation of species in our training set. This also established the effectiveness of a full-length training set for assigning taxonomy of long-read 16S rRNA gene datasets.ConclusionHere, we present a systematic approach for constructing a phylogeny-based, high-resolution, habitat-specific training set that permits species/supraspecies-level taxonomic assignment to short- and long-read 16S rRNA gene-derived ASVs. This advancement enhances the ecological and/or clinical relevance of 16S rRNA gene-based microbiota studies.DgoJmEpwWqGYbvQenHYPjXVideo

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.