Ultrafast and accurate 16S rRNA microbial community analysis using Kraken 2

Jennifer Lu,Steven L Salzberg

doi:10.1186/s40168-020-00900-2

Jennifer Lu, Steven L Salzberg

Open Access

https://doi.org/10.1186/s40168-020-00900-2

Copy DOI

Journal: Microbiome	Publication Date: Aug 28, 2020
Citations: 154	License type: open-access

Affiliation: Johns Hopkins University

Abstract

BackgroundFor decades, 16S ribosomal RNA sequencing has been the primary means for identifying the bacterial species present in a sample with unknown composition. One of the most widely used tools for this purpose today is the QIIME (Quantitative Insights Into Microbial Ecology) package. Recent results have shown that the newest release, QIIME 2, has higher accuracy than QIIME, MAPseq, and mothur when classifying bacterial genera from simulated human gut, ocean, and soil metagenomes, although QIIME 2 also proved to be the most computationally expensive. Kraken, first released in 2014, has been shown to provide exceptionally fast and accurate classification for shotgun metagenomics sequencing projects. Bracken, released in 2016, then provided users with the ability to accurately estimate species or genus relative abundances using Kraken classification results. Kraken 2, which matches the accuracy and speed of Kraken 1, now supports 16S rRNA databases, allowing for direct comparisons to QIIME and similar systems.MethodsFor a comprehensive assessment of each tool, we compare the computational resources and speed of QIIME 2’s q2-feature-classifier, Kraken 2, and Bracken in generating the three main 16S rRNA databases: Greengenes, SILVA, and RDP. For an evaluation of accuracy, we evaluated each tool using the same simulated 16S rRNA reads from human gut, ocean, and soil metagenomes that were previously used to compare QIIME, MAPseq, mothur, and QIIME 2. We evaluated accuracy based on the accuracy of the final genera read counts assigned by each tool. Finally, as Kraken 2 is the only tool providing per-read taxonomic assignments, we evaluate the sensitivity and precision of Kraken 2’s per-read classifications.ResultsFor both the Greengenes and SILVA database, Kraken 2 and Bracken are up to 100 times faster at database generation. For classification, using the same data as previous studies, Kraken 2 and Bracken are up to 300 times faster, use 100x less RAM, and generate results that more accurate at 16S rRNA profiling than QIIME 2’s q2-feature-classifier.ConclusionKraken 2 and Bracken provide a very fast, efficient, and accurate solution for 16S rRNA metataxonomic data analysis.BduTYvgx2U5MCkCSpFgxKvVideo

Highlights

Since the 1970s, sequencing of the 16S ribosomal RNA gene has been used for analyzing and identifying bacterial communities [1, 2]
For Kraken and Bracken, we used three 16S rRNA databases: Greengenes, SILVA, and RDP, while for Quantitative Insights into Microbial Ecology (QIIME), we only evaluated Greengenes and SILVA
Conclusion each of the 16S rRNA databases represents a large number of bacterial organisms, the accuracy of metataxonomic classifiers varied substantially among them

Summary

Introduction

Since the 1970s, sequencing of the 16S ribosomal RNA gene has been used for analyzing and identifying bacterial communities [1, 2] This technology targets the 16S rRNA gene, which has regions that are both highly conserved and highly variable (hypervariable) among bacterial species. The highly conserved regions allow for the design of “universal” PCR primers to target and amplify the 16S rRNA sequence, while the hypervariable regions allow for discrimination among different bacterial clades. These properties allow 16S rRNA sequencing experiments to capture most of the bacteria in a microbial community, which can be compared to large 16S rRNA databases to determine their identities. Kraken 2, which matches the accuracy and speed of Kraken 1, supports 16S rRNA databases, allowing for direct comparisons to QIIME and similar systems

Methods

Results

Discussion

Conclusion