Abstract

AbstractPublic databases such as the NCBI's GenBank have been used as repositories for genomic studies for more than 30 years. In this time, our understanding of the natural world, and especially the genomic world, has expanded vastly, and the size of these databases represent this genomic revolution. Databases like GenBank now help populate many molecular studies, supplementing a researcher's newly gathered data with publicly available sequences. Despite this, older sequence records, particularly those from understudied taxa, are frequently not updated in line with this burgeoning understanding, and this means that analyses that leverage this public data – from BLAST through to phylogenetic analyses – cannot do so with the full force of its collective understanding. This is particularly true for environmental DNA (eDNA) records, where older sequence records may identify sequences only to the phylum level, limiting their use in many studies. Here, with a case study of tardigrade 18S sequences, the family identities of 630 sequences, previously only identified to the phylum level, were established using 501 family, genus and species level 18S sequences, effectively doubling the depth and taxonomic resolution of tardigrade 18S sequences in GenBank.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call