Abstract
High-throughput 16S rRNA gene amplicon sequencing is an essential method for studying the diversity and dynamics of microbial communities. However, this method is presently hampered by the lack of high-identity reference sequences for many environmental microbes in the public 16S rRNA gene reference databases and by the absence of a systematic and comprehensive taxonomy for the uncultured majority. Here, we demonstrate how high-throughput synthetic long-read sequencing can be applied to create ecosystem-specific full-length 16S rRNA gene amplicon sequence variant (FL-ASV) resolved reference databases that include high-identity references (>98.7% identity) for nearly all abundant bacteria (>0.01% relative abundance) using Danish wastewater treatment systems and anaerobic digesters as an example. In addition, we introduce a novel sequence identity-based approach for automated taxonomy assignment (AutoTax) that provides a complete seven-rank taxonomy for all reference sequences, using the SILVA taxonomy as a backbone, with stable placeholder names for unclassified taxa. The FL-ASVs are perfectly suited for the evaluation of taxonomic resolution and bias associated with primers commonly used for amplicon sequencing, allowing researchers to choose those that are ideal for their ecosystem. Reference databases processed with AutoTax greatly improves the classification of short-read 16S rRNA ASVs at the genus- and species-level, compared with the commonly used universal reference databases. Importantly, the placeholder names provide a way to explore the unclassified environmental taxa at different taxonomic ranks, which in combination with in situ analyses can be used to uncover their ecological roles.
Highlights
Microbial communities underpin key biochemical transformations in natural and engineered ecosystems
The automated taxonomy assignment (AutoTax) script identifies the closest relative of each full-length 16S rRNA gene amplicon sequence variant (FL-amplicon sequence variants (ASVs)) in the SILVA database using the usearch -usearch_global command, obtains the taxonomy for this sequence, and discards information at taxonomic ranks not supported by the sequence identity and the thresholds for taxonomic ranks proposed by Yarza et al (12)
To obtain 16S rRNA gene reference sequences for Danish wastewater treatment plants (WWTPs) and anaerobic digesters (ADs), we sampled biomass from 22 typical WWTPs and 16 ADs treating waste activated sludge located at Danish wastewater treatment facilities (Table S2)
Summary
Microbial communities underpin key biochemical transformations in natural and engineered ecosystems. Examples of independent ecosystem-specific databases include the human intestinal tract 16S taxonomic database (HITdb) (20), the human oral microbiome database (HOMD) (21), the freshwater-specific FreshTrain database (22, 23), the honey bee gut microbiota database (24), and the rumen and intestinal methanogen database (25) While such databases have been shown to improve the rate of classifications for amplicons, they generally contain a relatively limited number of sequences and are associated with an inherent risk of over- or misclassification if the sequence being classified is not represented in the database. As evidence supporting the value of our approach, mapping of short-read amplicon data revealed that a substantially higher proportion of sequences were matched to high-identity references, and received species and genus level classification when the FL-ASV database was used compared to the much larger public universal reference databases
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have