Abstract

The applications of DNA barcoding have a wide range of uses, such as in taxonomic studies to help elucidate cryptic species and phylogenetic relationships and analyzing environmental samples for biodiversity monitoring and conservation assessments of species. After obtaining the DNA barcode sequences, sequence similarity-based homology analysis is commonly used. This means that the obtained barcode sequences are compared to the DNA barcode reference databases. This bioinformatic analysis necessarily implies that the overall quantity and quality of the reference databases must be stringently monitored to not have an adverse impact on the accuracy of species identification. With the development of next-generation sequencing techniques, a noticeably large number of DNA barcode sequences have been produced and are stored in online databases, but their degree of validity, accuracy, and reliability have not been extensively investigated. In this study, we investigated the extent to which the amount and types of erroneous barcode sequences were deposited in publicly accessible databases. Over 4.1 million sequences were investigated in three large-scale DNA barcode databases (NCBI GenBank, Barcode of Life Data System [BOLD], and Protist Ribosomal Reference database [PR2]) for four major DNA barcodes (cytochrome c oxidase subunit 1 [COI], internal transcribed spacer [ITS], ribulose bisphosphate carboxylase large chain [rbcL], and 18S ribosomal RNA [18S rRNA]); approximately 2% of erroneous barcode sequences were found and their taxonomic distributions were uneven. Consequently, our present findings provide compelling evidence of data quality problems along with insufficient and unreliable annotation of taxonomic data in DNA barcode databases. Therefore, we suggest that if ambiguous taxa are presented during barcoding analysis, further validation with other DNA barcode loci or morphological characters should be mandated. Key words: 18S rRNA; COI; DNA barcoding; ITS; rbcL; taxonomic databases

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call