Abstract

New genetic diagnostic approaches have greatly aided efforts to document global biodiversity and improve biosecurity. This is especially true for organismal groups in which species diversity has been underestimated historically due to difficulties associated with sampling, the lack of clear morphological characteristics, and/or limited availability of taxonomic expertise. Among these methods, DNA sequence barcoding (also known as “DNA barcoding”) and by extension, meta‐barcoding for biological communities, has emerged as one of the most frequently utilized methods for DNA‐based species identifications. Unfortunately, the use of DNA barcoding is limited by the availability of complete reference libraries (i.e., a collection of DNA sequences from morphologically identified species), and by the fact that the vast majority of species do not have sequences present in reference databases. Such conditions are critical especially in tropical locations that are simultaneously biodiversity rich and suffer from a lack of exploration and DNA characterization by trained taxonomic specialists. To facilitate efforts to document biodiversity in regions lacking complete reference libraries, we developed a novel statistical approach that categorizes unidentified species as being either likely native or likely nonnative based solely on measures of nucleotide diversity. We demonstrate the utility of this approach by categorizing a large sample of specimens of terrestrial insects and spiders (collected as part of the Moorea BioCode project) using a generalized linear mixed model (GLMM). Using a training data set of known endemic (n = 45) and known introduced species (n = 102), we then estimated the likely native/nonnative status for 4,663 specimens representing an estimated 1,288 species (412 identified species), including both those specimens that were either unidentified or whose endemic/introduced status was uncertain. Using this approach, we were able to increase the number of categorized specimens by a factor of 4.4 (from 794 to 3,497), and the number of categorized species by a factor of 4.8 from (147 to 707) at a rate much greater than chance (77.6% accuracy). The study identifies phylogenetic signatures of both native and nonnative species and suggests several practical applications for this approach including monitoring biodiversity and facilitating biosecurity.

Highlights

  • The genomics revolution is transforming the studies of conservation biology, ecology, and evolution (Hudson 2008, Allendorf et al 2010)

  • While there are numerous well documented limitations to the uses of DNA barcoding, including when the technique is used to reconstruct ancient evolutionary relationships, when non-specific amplification is not accounted for, and when fixed intra- and inter-interspecific thresholds are utilized (Moritz and Cicero 2004, Thalmann et al 2004, DeSalle et al 2005, Meyer and Paulay 2005, Rubinoff et al 2006, Buhay 2009, Moulton et al 2010), one of the core uses for DNA barcoding is the comparison of query sequences to reference DNA sequences to determine the percentage of sequence similarity

  • Because the native/nonnative status of a species is often difficult to determine in natural communities, here we propose to use a training data set of known introduced and known endemic species with the assumption that introduced and endemic species might be at opposite ends of the native/nonnative spectrum and that if the above measures of genetic diversity are distinct between native and nonnative species, that they should be even more so between endemic and introduced species

Read more

Summary

Introduction

The genomics revolution is transforming the studies of conservation biology, ecology, and evolution (Hudson 2008, Allendorf et al 2010). One approach for collecting and analyzing sequence data that has become widely utilized is known as DNA sequence barcoding ( known as “DNA barcoding”; Hebert et al 2003, Savolainen et al 2005, Ratnasingham and Hebert 2007) This technique has been used in many contexts; including biodiversity inventories (Janzen et al 2005), cryptic species discovery (Hebert et al 2004a), species identification (Hebert et al 2003, 2004b, Kress et al 2005), species delimitation (Pons et al 2006), biomonitoring (Pilgrim et al 2011), biosecurity (Saunders 2009, Collins et al 2012, Dejean et al 2012, Porco et al 2013, Ashfaq and Hebert 2016, Thomas et al 2016), for phylogenetic and population genetic studies (Hajibabaei et al 2007), and for observations of within-species genetic diversity (e.g., Johnson et al 2002, Havill et al 2018). While there are numerous well documented limitations to the uses of DNA barcoding, including when the technique is used to reconstruct ancient evolutionary relationships, when non-specific amplification is not accounted for, and when fixed intra- and inter-interspecific thresholds are utilized (Moritz and Cicero 2004, Thalmann et al 2004, DeSalle et al 2005, Meyer and Paulay 2005, Rubinoff et al 2006, Buhay 2009, Moulton et al 2010), one of the core uses for DNA barcoding is the comparison of query sequences to reference DNA sequences to determine the percentage of sequence similarity

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call