Lack of Statistical Rigor in DNA Barcoding Likely Invalidates the Presence of a True Species' Barcode Gap

Jarrett D. Phillips,Robert H. Hanner,Daniel J. Gillis

doi:10.3389/fevo.2022.859099

Jarrett D. Phillips, Robert H. Hanner + Show 1 more

Open Access

https://doi.org/10.3389/fevo.2022.859099

Copy DOI

Journal: Frontiers in Ecology and Evolution	Publication Date: Apr 14, 2022
Citations: 10	License type: CC BY 4.0

Affiliation: University of Guelph

Abstract

DNA barcoding has been largely successful in satisfactorily exposing levels of standing genetic diversity for a wide range of taxonomic groups through the employment of only one or a few universal gene markers. However, sufficient coverage of geographically-broad intra-specific haplotype variation within genomic databases like the Barcode of Life Data Systems (BOLD) and GenBank remains relatively sparse. As reference sequence libraries continue to grow exponentially in size, there is now the need to identify novel ways of meaningfully analyzing vast amounts of available DNA barcode data. This is an important issue to address promptly for the routine tasks of specimen identification and species discovery, which have seen broad adoption in areas as diverse as regulatory forensics and resource conservation. Here, it is demonstrated that the interpretation of DNA barcoding data is lacking in statistical rigor. To highlight this, focus is set specifically on one key concept that has become a household name in the field: the DNA barcode gap. Arguments outlined herein specifically center on DNA barcoding in animal taxa and stem from three angles: (1) the improper allocation of specimen sampling effort necessary to capture adequate levels of within-species genetic variation, (2) failing to properly visualize intra-specific and interspecific genetic distances, and (3) the inconsistent, inappropriate use, or absence of statistical inferential procedures in DNA barcoding gap analyses. Furthermore, simple statistical solutions are outlined which can greatly propel the use of DNA barcoding as a tool to irrefutably match unknowns to knowns on the basis of the barcoding gap with a high degree of confidence. Proposed methods examined herein are illustrated through application to DNA barcode sequence data from Canadian Pacific fish species as a case study.

Full Text