Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences.

Robert C Edgar

doi:10.7717/peerj.4652

Abstract

Prediction of taxonomy for marker gene sequences such as 16S ribosomal RNA (rRNA) is a fundamental task in microbiology. Most experimentally observed sequences are diverged from reference sequences of authoritatively named organisms, creating a challenge for prediction methods. I assessed the accuracy of several algorithms using cross-validation by identity, a new benchmark strategy which explicitly models the variation in distances between query sequences and the closest entry in a reference database. When the accuracy of genus predictions was averaged over a representative range of identities with the reference database (100%, 99%, 97%, 95% and 90%), all tested methods had ≤50% accuracy on the currently-popular V4 region of 16S rRNA. Accuracy was found to fall rapidly with identity; for example, better methods were found to have V4 genus prediction accuracy of ∼100% at 100% identity but ∼50% at 97% identity. The relationship between identity and taxonomy was quantified as the probability that a rank is the lowest shared by a pair of sequences with a given pair-wise identity. With the V4 region, 95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal.

Highlights

Next-generation sequencing of tags such as the 16S ribosomal RNA gene and fungal internal transcribed spacer (ITS) region has revolutionized the study of microbial communities in environments ranging from the human body (Cho & Blaser, 2012; Pflughoeft & Versalovic, 2012) to oceans (Moran, 2015) and soils (Hartmann et al, 2014)
Many taxonomy prediction algorithms have been developed, including the RDP Naive Bayesian Classifier (NBC) (Wang et al, 2007), GAST (Huse et al, 2008), the lowest common ancestor (LCA) method in MEGAN (Mitra, Stark & Huson, 2011), 16Sclassifier (Chaudhary et al, 2015), SPINGO (Allard et al, 2015), Metaxa2 (Bengtsson-Palme et al, 2015), SINTAX (Edgar, 2016), PROTAX (Somervuo et al, 2016), microclass (Liland, Vinje & Snipen, 2017), and methods implemented by the mothur (Schloss et al, 2009), QIIME v1 (Caporaso et al, 2010) and QIIME v2 packages
lowest common rank (LCR) probabilities have the advantage of independence from clustering methods and cluster quality metrics, which give conflicting results for optimal threshold values (Edgar, 2018a)

Summary

Introduction

Next-generation sequencing of tags such as the 16S ribosomal RNA (rRNA) gene and fungal internal transcribed spacer (ITS) region has revolutionized the study of microbial communities in environments ranging from the human body (Cho & Blaser, 2012; Pflughoeft & Versalovic, 2012) to oceans (Moran, 2015) and soils (Hartmann et al, 2014). Many taxonomy prediction algorithms have been developed, including the RDP Naive Bayesian Classifier (NBC) (Wang et al, 2007), GAST (Huse et al, 2008), the lowest common ancestor (LCA) method in MEGAN (Mitra, Stark & Huson, 2011), 16Sclassifier (Chaudhary et al, 2015), SPINGO (Allard et al, 2015), Metaxa (Bengtsson-Palme et al, 2015), SINTAX (Edgar, 2016), PROTAX (Somervuo et al, 2016), microclass (Liland, Vinje & Snipen, 2017), and methods implemented by the mothur (Schloss et al, 2009), QIIME v1 (Caporaso et al, 2010) and QIIME v2 (https://qiime2.org) packages. Most taxonomies in the RDP database were predicted by the RDP NBC, while most taxonomies in Greengenes and SILVA were annotated by a combination of database-specific computational prediction methods and manual curation (McDonald et al, 2012; Yilmaz et al, 2014)

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Apr 18, 2018
Citations: 226	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

An open source software package for automated extraction of ITS1 and ITS2 from fungal ITS sequences for use in high-throughput community assays and molecular ecology
R Henrik Nilsson ... Kessy Abarenkov
Fungal Ecology | VOL. 3
R Henrik Nilsson, et. al.R Henrik Nilsson ... Kessy Abarenkov
30 Jun 2010
Fungal Ecology | VOL. 3

Effects of cloning and root-tip size on observations of fungal ITS sequences from Picea glauca roots
Daniel L Lindner ... Mark T Banik
Mycologia | VOL. 101
Daniel L Lindner, et. al.Daniel L Lindner ... Mark T Banik
01 Jan 2009
Mycologia | VOL. 101

Five simple guidelines for establishing basic authenticity and reliability of newly generated fungal ITS sequences
R Henrik Nilsson ... Urmas Kõljalg
MycoKeys | VOL. 4
R Henrik Nilsson, et. al.R Henrik Nilsson ... Urmas Kõljalg
05 Sep 2012
MycoKeys | VOL. 4

Mycobiome: Approaches to analysis of intestinal fungi
Jie Tang ... Vincent A Funari
Journal of Immunological Methods | VOL. 421
Jie Tang, et. al.Jie Tang ... Vincent A Funari
17 Apr 2015
Journal of Immunological Methods | VOL. 421

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ