Abstract

There is currently no criterion to select appropriate bioinformatics tools and reference databases for analysis of 16S rRNA amplicon data in the human oral microbiome. Our study aims to determine the influence of multiple tools and reference databases on α-diversity measurements and β-diversity comparisons analyzing the human oral microbiome. We compared the results of taxonomical classification by Greengenes, the Human Oral Microbiome Database (HOMD), National Center for Biotechnology Information (NCBI) 16S, SILVA, and the Ribosomal Database Project (RDP) using Quantitative Insights Into Microbial Ecology (QIIME) and the Divisive Amplicon Denoising Algorithm (DADA2). There were 15 phyla present in all of the analyses, four phyla exclusive to certain databases, and different numbers of genera were identified in each database. Common genera found in the oral microbiome, such as Veillonella, Rothia, and Prevotella, are annotated by all databases; however, less common genera, such as Bulleidia and Paludibacter, are only annotated by large databases, such as Greengenes. Our results indicate that using different reference databases in 16S rRNA amplicon data analysis could lead to different taxonomic compositions, especially at genus level. There are a variety of databases available, but there are no defined criteria for data curation and validation of annotations, which can affect the accuracy and reproducibility of results, making it difficult to compare data across studies.

Highlights

  • With decreasing costs, speed improvements, and throughput of DNA-sequencing techniques, analyses using marker genes (e.g., 16S rRNA or 18S rRNA) have become one of the most common methods for studying microbial communities [1]

  • By using Quantitative Insights Into Microbial Ecology (QIIME), 3,125,624 sequences remained after merging and demultiplexing. These sequences were clustered into Greengenes: 16,018, Human Oral Microbiome Database (HOMD): 14,291, National Center for Biotechnology Information (NCBI): 16,028, SILVA: 16,078, and Ribosomal Database Project (RDP): 16,426 Operational Taxonomic Unit (OTU)

  • By using DADA2 and each of the databases, 2,750,305 sequences remained after quality control, denoising and merging, which were clustered into 9264 Amplicon Sequence Variants (ASV)

Read more

Summary

Introduction

With decreasing costs, speed improvements, and throughput of DNA-sequencing techniques, analyses using marker genes (e.g., 16S rRNA or 18S rRNA) have become one of the most common methods for studying microbial communities [1]. Despite the wide use of 16S rRNA sequencing due to the latest advancements and benefits, errors and biases are introduced at different steps of the molecular experiment stage, from DNA extraction to sequencing, including amplification bias [6], chimeras [7], and biases introduced during computational analysis, such as Operational Taxonomic Unit (OTU) generation strategy, reference taxonomic sets, clustering algorithms, and specific software implementation [8,9] These methodologic differences could have dramatic effects on the accuracy of taxonomic classification, and α- and β-diversity estimation in 16S sequencing

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call