Abstract
The accuracy of the DNA barcoding tool depends on the existence of a comprehensive archived library of sequences reliably determined at species level by expert taxonomists. However, misidentifications are not infrequent, especially following large-scale DNA barcoding campaigns on diverse and taxonomically complex groups. In this study we used the species-rich flea beetle genus Longitarsus, that requires a high level of expertise for morphological species identification, as a case study to assess the accuracy of the DNA barcoding tool following several optimization procedures. We built a cox1 reference database of 1502 sequences representing 78 Longitarsus species, among which 117 sequences (32 species) were newly generated using a non-invasive DNA extraction method that allows keeping reference voucher specimens. Within this dataset we identified 69 taxonomic inconsistencies using barcoding gap analysis and tree topology methods. Threshold optimisation and a posteriori taxonomic revision based on newly generated reference sequences and metadata allowed resolving 44 sequences with ambiguous and incorrect identification and provided a significant improvement of the DNA barcoding accuracy and identification efficacy. Unresolved taxonomic uncertainties, due to overlapping intra- and inter-specific levels of divergences, mainly regards the Longitarsus pratensis species complex and polyphyletic groups L. melanocephalus, L. nigrofasciatus and L. erro. Such type of errors indicates either poorly established taxonomy or any biological processes that make mtDNA groups poorly predictive of species boundaries (e.g. recent speciation or interspecific hybridisation), thus providing directions for further integrative taxonomic and evolutionary studies. Overall, this study underlines the importance of reference vouchers and high-quality metadata associated to sequences in reference databases and corroborates, once again, the key role of taxonomists in any step of the DNA barcoding pipeline in order to generate and maintain a correct and functional reference library.
Highlights
DNA barcoding is a molecular method of specimen identification using a short segment of DNA from a specific standardized gene which is compared against a database of known sequences from morphologically identified specimens
Large-scale DNA barcoding studies has been performed on various groups of animals and have generated an enormous amount of cytochrome oxidase I barcodes, which are usually stored in GenBank1 and the official Barcode of Life database (BOLD) [7,8,9]
In this study we focused on identifying errors that affect the accuracy of DNA barcoding, distinguishing between tool extrinsic errors, i.e. those relative to the quality of the reference dataset, and intrinsic errors, i.e. those due to all those biological processes that generate a mismatch between mtDNA groups and species boundaries, making the barcoding tool unreliable in identifying specimens to the species level
Summary
DNA barcoding is a molecular method of specimen identification using a short segment of DNA from a specific standardized gene which is compared against a database of known sequences from morphologically identified specimens. Large-scale DNA barcoding studies has been performed on various groups of animals and have generated an enormous amount of cytochrome oxidase I (cox1) barcodes, which are usually stored in GenBank and the official Barcode of Life database (BOLD) [7,8,9] The association of such amount of sequences to taxa is a challenging step of these studies, especially for extraordinarily diverse group such as insects [10]. Species-level identification represents a great challenge in some hyper-diverse and widespread genera, for which many taxonomists, each one with a long-standing taxonomic specialization on a regional fauna, might be required [11, 12] This implies that broad-based DNA barcoding studies should ideally recruit hundreds of specialised taxonomists, but this is not feasible. A certain degree of misidentification is inherent to these studies, and can be anticipated in species-rich taxa with difficult taxonomy [13]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have