Abstract
Summary A number of systems have been developed for taxonomic identification of DNA sequence data. However, in eukaryotes, these systems are largely based on single predefined genes, and thus are vulnerable to biases from limited character sampling, and are not able to identify most sequences of genomic origin. We here demonstrate an implementation for multigene DNA barcoding. First, a reference framework is built of frequently sequenced loci. Query sequence data are then organized by excising sequences homologous to references and assigning species names where the level of sequence similarity between query and reference falls within the (gene‐appropriate) level of intraspecific variation usually observed. The approach is compared to some existing methods including ‘bagpipe_phylo’, a re‐implementation for taxonomic assignment on phylogenies. Seventy‐eight per cent of the species and 94% of the genera known to be present in arthropod test queries were correctly inferred by the proposed multigene system. Most critically, the rate of species identification was increased over using a COI‐only approach. Twenty‐four per cent of species in the queries were found only in non‐COI genes, with no clear reduction in the accuracy of species assignment at many of these other loci. Similarly, additional species assignments were made for a pooled metagenomic data set using non‐COI columns. On a smaller query data set of 273 bee sequences, the accuracy of species assignment using modified calculation of distances was indistinguishable from phylogeny‐based taxonomic identification. Standardized single fragment DNA barcoding remains an invaluable tool in species identification for PCR‐generated sequence data. The approach developed here supplements the established species‐dense DNA barcode backbone with other genomic data, reducing error via integration of independent genetic loci and permitting additional identifications for non‐barcode fragments. The latter will be particularly relevant in monitoring of community genomics using next‐generation sequencing platforms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.