A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy

Xiang Gao,Kashi Revanna,Qunfeng Dong,Huaiying Lin

doi:10.1186/s12859-017-1670-4

Abstract

BackgroundSpecies-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement.ResultsWe have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes.ConclusionsReliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA.

Highlights

Species-level classification for 16S ribosomal RNA (rRNA) gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide specieslevel classification, or their classification results are unreliable
Despite the availability of those taxonomic classification tools, species-level classification for 16S rRNA gene sequences still remains a serious challenge for microbiome researchers
The standard version of the widely-used software, RDP Classifier, only classifies 16S rRNA gene sequences from the phylum to genus levels, Gao et al BMC Bioinformatics (2017) 18:247 the RDP Classifier can be re-trained for species level classification

Summary

Introduction

Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide specieslevel classification, or their classification results are unreliable. The standard version of the widely-used software, RDP Classifier, only classifies 16S rRNA gene sequences from the phylum to genus levels, Gao et al BMC Bioinformatics (2017) 18:247 the RDP Classifier can be re-trained for species level classification. For the other tools that can classify at the species level, they suffer from at least one of the two major limitations: i) nucleotide k-mer frequency is used for measuring similarity between query and database sequences, a proxy measurement of true sequence similarity; ii) solid probabilistic-based criteria is lacking for evaluating the confidence of taxonomic assignment results, to evaluate whether the best-matched database sequence is significantly better than other database matches for the taxonomic assignments

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 10, 2017
Citations: 160	License type: open-access

R Discovery Prime

R Discovery Prime

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Evaluating the accuracy of amplicon-based microbiome computational pipelines on simulated human gut microbial communities
Jonathan L Golob ... Elisa Margolis
BMC Bioinformatics | VOL. 18
Jonathan L Golob, et. al.Jonathan L Golob ... Elisa Margolis
30 May 2017
BMC Bioinformatics | VOL. 18

A quantitative sequencing method using synthetic internal standards including functional and phylogenetic marker genes.
Kazuyoshi Koike ... Masataka Aoki
Environmental microbiology reports | VOL. 15
Kazuyoshi Koike, et. al.Kazuyoshi Koike ... Masataka Aoki
18 Jul 2023
Environmental microbiology reports | VOL. 15

EnsembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences.
Dylan Catlett ... Kevin Son
PeerJ | VOL. 9
Dylan Catlett, et. al.Dylan Catlett ... Kevin Son
26 Jul 2021
PeerJ | VOL. 9

SpeciateIT and vSpeciateDB: novel, fast, and accurate per sequence 16S rRNA gene taxonomic classification of vaginal microbiota
Johanna B Holm ... Jacques Ravel
BMC Bioinformatics | VOL. 25
Johanna B Holm, et. al.Johanna B Holm ... Jacques Ravel
27 Sep 2024
BMC Bioinformatics | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics