Two New Computational Methods for Universal DNA Barcoding: A Benchmark Using Barcode Sequences of Bacteria, Archaea, Animals, Fungi, and Land Plants

Akifumi S Tanabe,Hirokazu Toju

doi:10.1371/journal.pone.0076910

Akifumi S Tanabe, Hirokazu Toju

Open Access

https://doi.org/10.1371/journal.pone.0076910

Copy DOI

Journal: PLoS ONE	Publication Date: Oct 18, 2013
Citations: 226	License type: CC BY 4.0

Affiliation: Kyoto University, Fisheries Research Agency

Abstract

Taxonomic identification of biological specimens based on DNA sequence information (a.k.a. DNA barcoding) is becoming increasingly common in biodiversity science. Although several methods have been proposed, many of them are not universally applicable due to the need for prerequisite phylogenetic/machine-learning analyses, the need for huge computational resources, or the lack of a firm theoretical background. Here, we propose two new computational methods of DNA barcoding and show a benchmark for bacterial/archeal 16S, animal COX1, fungal internal transcribed spacer, and three plant chloroplast (rbcL, matK, and trnH-psbA) barcode loci that can be used to compare the performance of existing and new methods. The benchmark was performed under two alternative situations: query sequences were available in the corresponding reference sequence databases in one, but were not available in the other. In the former situation, the commonly used “1-nearest-neighbor” (1-NN) method, which assigns the taxonomic information of the most similar sequences in a reference database (i.e., BLAST-top-hit reference sequence) to a query, displays the highest rate and highest precision of successful taxonomic identification. However, in the latter situation, the 1-NN method produced extremely high rates of misidentification for all the barcode loci examined. In contrast, one of our new methods, the query-centric auto-k-nearest-neighbor (QCauto) method, consistently produced low rates of misidentification for all the loci examined in both situations. These results indicate that the 1-NN method is most suitable if the reference sequences of all potentially observable species are available in databases; otherwise, the QCauto method returns the most reliable identification results. The benchmark results also indicated that the taxon coverage of reference sequences is far from complete for genus or species level identification in all the barcode loci examined. Therefore, we need to accelerate the registration of reference barcode sequences to apply high-throughput DNA barcoding to genus or species level identification in biodiversity research.

Highlights

Biodiversity surveys are important when formulating policies for the conservation of endangered species, assessing the environmental impacts of land development projects, and exploring novel bioproducts [1,2]
In the benchmark based on no-leave-one-out cross-validation (LOOCV), the 1-NN method most frequently returned perfect identification results of all the methods tested for all the barcode loci (Fig. 3)
Given that a query sequence was not removed from a reference sequence database in a no-LOOCV, this result suggests that the 1-NN method is the best DNA barcoding method, if the barcode sequences of all potentially observable species are registered to a reference database

Summary

Introduction

Biodiversity surveys are important when formulating policies for the conservation of endangered species, assessing the environmental impacts of land development projects, and exploring novel bioproducts [1,2]. Because extracellular DNA released into soil or water can be PCR-amplified and/or sequenced [13], the DNA barcoding of such ‘‘environmental DNA’’ dissolved in water potentially enables ultrarapid surveys of aquatic macroorganisms in a lake [14,15]. Such recent technical developments and the declining cost of DNA sequencing have increased the opportunities to utilize DNA barcoding in ecological and evolutionary studies [13,16,17]. The development of a theoretically firm framework to ‘‘translate’’ raw DNA sequencing data into organismal taxonomic information is crucial (see Coissac et al [18] and references therein)

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Two New Computational Methods for Universal DNA Barcoding: A Benchmark Using Barcode Sequences of Bacteria, Archaea, Animals, Fungi, and Land Plants

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Correction: Two New Computational Methods for Universal DNA Barcoding: A Benchmark Using Barcode Sequences of Bacteria, Archaea, Animals, Fungi, and Land Plants.
Akifumi S Tanabe ... Hirokazu Toju
PLOS ONE | VOL. 11
Akifumi S Tanabe, et. al.Akifumi S Tanabe ... Hirokazu Toju
24 Mar 2016
PLOS ONE | VOL. 11

DNA Barcoding: a new tool with wide array of applications
Mahdi Arzanlou
Research in Molecular Medicine | VOL. 1
Mahdi ArzanlouMahdi Arzanlou
01 Sep 2013
Research in Molecular Medicine | VOL. 1

Pragmatic Applications and Universality of DNA Barcoding for Substantial Organisms at Species Level: A Review to Explore a Way Forward.
Sarfraz Ahmed ... Mohammad Khursheed Alam
BioMed Research International | VOL. 2022
Sarfraz Ahmed, et. al.Sarfraz Ahmed ... Mohammad Khursheed Alam
11 Jan 2022
BioMed Research International | VOL. 2022

Metabarcoding of zooplankton diversity within the Chukchi Borderland, Arctic Ocean: improved resolution from multi-gene markers and region-specific DNA databases
Jennifer M Questel ... Ksenia N Kosobokova
Marine Biodiversity | VOL. 51
Jennifer M Questel, et. al.Jennifer M Questel ... Ksenia N Kosobokova
09 Jan 2021
Marine Biodiversity | VOL. 51

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Two New Computational Methods for Universal DNA Barcoding: A Benchmark Using Barcode Sequences of Bacteria, Archaea, Animals, Fungi, and Land Plants

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE