Abstract

Recently diverged species are challenging for identification, yet they are frequently of special interest scientifically as well as from a regulatory perspective. DNA barcoding has proven instrumental in species identification, especially in insects and vertebrates, but for the identification of recently diverged species it has been reported to be problematic in some cases. Problems are mostly due to incomplete lineage sorting or simply lack of a ‘barcode gap’ and probably related to large effective population size and/or low mutation rate. Our objective was to compare six methods in their ability to correctly identify recently diverged species with DNA barcodes: neighbor joining and parsimony (both tree-based), nearest neighbor and BLAST (similarity-based), and the diagnostic methods DNA-BAR, and BLOG. We analyzed simulated data assuming three different effective population sizes as well as three selected empirical data sets from published studies. Results show, as expected, that success rates are significantly lower for recently diverged species (∼75%) than for older species (∼97%) (P<0.00001). Similarity-based and diagnostic methods significantly outperform tree-based methods, when applied to simulated DNA barcode data (P<0.00001). The diagnostic method BLOG had highest correct query identification rate based on simulated (86.2%) as well as empirical data (93.1%), indicating that it is a consistently better method overall. Another advantage of BLOG is that it offers species-level information that can be used outside the realm of DNA barcoding, for instance in species description or molecular detection assays. Even though we can confirm that identification success based on DNA barcoding is generally high in our data, recently diverged species remain difficult to identify. Nevertheless, our results contribute to improved solutions for their accurate identification.

Highlights

  • Diverged species are frequently of special interest, for example in ecology, regulation or forensics [1,2,3], and their accurate identification is warranted

  • Even though recently diverged species pose a significant problem for effective DNA barcoding, sensitive similarity-based and diagnostic methods can significantly improve identification performance compared with the commonly used tree-based methods such as neighbor joining (NJ)

  • Few differences in overall results exist, : Where scores for tree-based NJ were suboptimal based on simulated data, they were comparable to at least some of the similarity-based and diagnostic methods when applied to the empirical data sets

Read more

Summary

Introduction

Diverged species are frequently of special interest, for example in ecology, regulation or forensics [1,2,3], and their accurate identification is warranted. Coalescent theory [40] predicts that the chance that gene sequences sampled from a species are monophyletic is dependent on the age of that species (measured in number of generations since speciation) and reversely dependent on its effective population size (Ne) [40,41]. This is because species with large Ne are predicted to have larger withinspecies genetic variation [40,41,42]. Based on simulated DNA barcode data sets Ross et al [37] and Austerlitz et al [35] found that species monophyly and identification success generally decreased with increasing coalescent depth

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call