IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences

Adithya Murali,Aniruddha Bhargava,Erik S Wright

doi:10.1186/s40168-018-0521-5

Adithya Murali, Aniruddha Bhargava + Show 1 more

Open Access

https://doi.org/10.1186/s40168-018-0521-5

Copy DOI

Abstract

BackgroundMicrobiome studies often involve sequencing a marker gene to identify the microorganisms in samples of interest. Sequence classification is a critical component of this process, whereby sequences are assigned to a reference taxonomy containing known sequence representatives of many microbial groups. Previous studies have shown that existing classification programs often assign sequences to reference groups even if they belong to novel taxonomic groups that are absent from the reference taxonomy. This high rate of “over classification” is particularly detrimental in microbiome studies because reference taxonomies are far from comprehensive.ResultsHere, we introduce IDTAXA, a novel approach to taxonomic classification that employs principles from machine learning to reduce over classification errors. Using multiple reference taxonomies, we demonstrate that IDTAXA has higher accuracy than popular classifiers such as BLAST, MAPSeq, QIIME, SINTAX, SPINGO, and the RDP Classifier. Similarly, IDTAXA yields far fewer over classifications on Illumina mock microbial community data when the expected taxa are absent from the training set. Furthermore, IDTAXA offers many practical advantages over other classifiers, such as maintaining low error rates across varying input sequence lengths and withholding classifications from input sequences composed of random nucleotides or repeats.ConclusionsIDTAXA’s classifications may lead to different conclusions in microbiome studies because of the substantially reduced number of taxa that are incorrectly identified through over classification. Although misclassification error is relatively minor, we believe that many remaining misclassifications are likely caused by errors in the reference taxonomy. We describe how IDTAXA is able to identify many putative mislabeling errors in reference taxonomies, enabling training sets to be automatically corrected by eliminating spurious sequences. IDTAXA is part of the DECIPHER package for the R programming language, available through the Bioconductor repository or accessible online (http://DECIPHER.codes).

Highlights

Microbiome studies often involve sequencing a marker gene to identify the microorganisms in samples of interest
The IDTAXA algorithm exhibits lower over classification error rates We focused on the basal taxonomic rank in each training set for benchmarking classification accuracy because the basal rank is the most difficult to predict
Here, we have shown that IDTAXA substantially reduces false positive classifications of test sequences falling outside the scope of a training set

Summary

Introduction

Microbiome studies often involve sequencing a marker gene to identify the microorganisms in samples of interest. Previous studies have shown that existing classification programs often assign sequences to reference groups even if they belong to novel taxonomic groups that are absent from the reference taxonomy. This high rate of “over classification” is detrimental in microbiome studies because reference taxonomies are far from comprehensive. Microbiome studies frequently involve sequencing a taxonomic marker, such as the 16S ribosomal RNA (rRNA) gene or internal transcribed spacer (ITS), to identify the microorganisms that are present in a sample. Nearest neighbor methods are popular in part due to their simplicity and clearly defined basis for taxonomic assignment, but frequently fail where taxonomic groups do not conform to standard distance cutoffs [6]

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Microbiome	Publication Date: Aug 9, 2018
Citations: 357	License type: open-access

R Discovery Prime

R Discovery Prime

IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Microbiome

Lead the way for us

Similar Papers

CREST--classification resources for environmental sequence tags.
Anders Lanzén ... Svenn Helge Grindhaug
PLoS ONE | VOL. 7
Anders Lanzén, et. al.Anders Lanzén ... Svenn Helge Grindhaug
08 Nov 2012
PLoS ONE | VOL. 7

Unbiased Taxonomic Annotation of Metagenomic Samples.
Bruno Fosso ... Francesc Rosselló
Journal of Computational Biology | VOL. 25
Bruno Fosso, et. al.Bruno Fosso ... Francesc Rosselló
01 Mar 2018
Journal of Computational Biology | VOL. 25

An in-depth evaluation of metagenomic classifiers for soil microbiomes
Niranjana Rose Edwin ... Orla O’Sullivan
Environmental Microbiome | VOL. 19
Niranjana Rose Edwin, et. al.Niranjana Rose Edwin ... Orla O’Sullivan
28 Mar 2024
Environmental Microbiome | VOL. 19

Unbiased probabilistic taxonomic classification for DNA barcoding.
Panu Somervuo ... R Henrik Nilsson
Bioinformatics | VOL. 32
Panu Somervuo, et. al.Panu Somervuo ... R Henrik Nilsson
13 Jun 2016
Bioinformatics | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Microbiome