Abstract

16S rRNA gene amplicon sequencing is a popular approach for studying microbiomes. However, some basic concepts have still not been investigated comprehensively. We studied the occurrence of spurious sequences using defined microbial communities based on data either from the literature or generated in three sequencing facilities and analyzed via both operational taxonomic units (OTUs) and amplicon sequence variants (ASVs) approaches. OTU clustering and singleton removal, a commonly used approach, delivered approximately 50% (mock communities) to 80% (gnotobiotic mice) spurious taxa. The fraction of spurious taxa was generally lower based on ASV analysis, but varied depending on the gene region targeted and the barcoding system used. A relative abundance of 0.25% was found as an effective threshold below which the analysis of spurious taxa can be prevented to a large extent in both OTU- and ASV-based analysis approaches. Using this cutoff improved the reproducibility of analysis, i.e., variation in richness estimates was reduced by 38% compared with singleton filtering using six human fecal samples across seven sequencing runs. Beta-diversity analysis of human fecal communities was markedly affected by both the filtering strategy and the type of phylogenetic distances used for comparison, highlighting the importance of carefully analyzing data before drawing conclusions on microbiome changes. In summary, handling of artifact sequences during bioinformatic processing of 16S rRNA gene amplicon data requires careful attention to avoid the generation of misleading findings. We propose the concept of effective richness to facilitate the comparison of alpha-diversity across studies.

Highlights

  • Since the late 2000s, high-throughput sequencing of 16S rRNA gene amplicons has become the most popular method for rapid analysis of the diversity and composition of complex microbial communities [1]

  • One common approach that has been used for decades [2] and is included in many freely available processing pipelines [3, 4] consists of building clusters of sequences representing single microbial entities, known as operational taxonomic units (OTUs), at a defined level of sequence identity determined by the user [5]

  • Other strategies, such as exact/amplicon sequence variant (ASV) analysis [6], are available, but do not replace the relevance of OTU-based approaches, as both can be applied in a synergistic manner and generate complementary readouts

Read more

Summary

Introduction

Since the late 2000s, high-throughput sequencing of 16S rRNA gene amplicons has become the most popular method for rapid analysis of the diversity and composition of complex microbial communities [1]. Despite its popularity and usefulness, the method is prone to technical artifacts at various levels of the workflow, from sample processing to data analysis For the latter, one common approach that has been used for decades [2] and is included in many freely available processing pipelines [3, 4] consists of building clusters of sequences representing single microbial entities, known as operational taxonomic units (OTUs), at a defined level of sequence identity determined by the user (usually >97% used as proxy for species-level diversity) [5]. This approach is extremely sensitive to several factors, such as the number of samples in the dataset and sequencing depth achieved, and its suitability for removing spurious taxa has not been rigorously evaluated

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.