Datataxa: a new script to extract metadata sequence information from GenBank, the Flora of Bajío as a case study

Eduardo Ruiz-Sanchez,Sergio Zamudio,Victor W Steinmann,Jerzy Rzedowski,Eleazar Carranza,Rosa María Murillo,Carlos Alonso Maya-Lastra

doi:10.17129/botsci.2226

Abstract

Background : GenBank is a public repository that houses millions of nucleotide sequences. Several software have been developed to extract information stored in GenBank. However, none of them are useful to extract and organize GenBank accession based on metadata. We developed a new script called Datataxa, which works to mine GenBank information. The checklist of the Flora del Bajío y de Regiones Adyacentes (FBRA) was used as a case study to apply our script. Questions: How many species occurring in the FBRA have records in GenBank? What percentage of those records have been used for phylogenetic, phylogeographic, phylogenomic, barcoding, genetic diversity, and biogeographic studies? Methods: Datataxa was written in AutoIt Scripting Language in order to facilitate the extraction of information from GenBank. This information was classified in six study categories. A checklist of species published fascicles of FBRA was used as study case to apply our new script, and the previous categories were applied to the FBRA species list. Results : The script allowed us to search for meta information, like publication titles, for 2,558 species that were included in the FBRA. Of these, 1,575 had a least one record in GenBank. A total of 1,322 species were used in phylogenetic studies, followed by barcoding studies (326) and biogeographic studies (298). Phylogenomic (41), phylogeographic (34), and diversity studies (34) were the least represented. Conclusions : Datataxa was useful for mining metadata sequence information from GenBank and can be used with any list of species to get the GenBank accessions’ metadata.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Botanical Sciences	Publication Date: Dec 19, 2019
Citations: 3	License type: CC BY-NC 4.0

R Discovery Prime

R Discovery Prime

Datataxa: a new script to extract metadata sequence information from GenBank, the Flora of Bajío as a case study

Abstract

Talk to us

Similar Papers

More From: Botanical Sciences

Lead the way for us

Similar Papers

“COI-like” Sequences Are Becoming Problematic in Molecular Systematic and DNA Barcoding Studies
Jennifer E Buhay
Journal of Crustacean Biology | VOL. 29
Jennifer E BuhayJennifer E Buhay
01 Jan 2009
Journal of Crustacean Biology | VOL. 29

Molecular Identification of Reptiles from Tabuk Region of Saudi Arabia Through DNA Barcoding: A Case Study
Bishal Dhar ... N Neelima Devi
-
Bishal Dhar, et. al.Bishal Dhar ... N Neelima Devi
01 Jan 2020
01 Jan 2020

Implications and future prospects for evolutionary analyses of DNA in historical herbarium collections
Vanessa C Bieker ... Michael D Martin
Botany Letters | VOL. 165
Vanessa C Bieker, et. al.Vanessa C Bieker ... Michael D Martin
23 Apr 2018
Botany Letters | VOL. 165

Genomic Resources of Three Pulsatilla Species Reveal Evolutionary Hotspots, Species-Specific Sites and Variable Plastid Structure in the Family Ranunculaceae
Monika Szczecińska ... Jakub Sawicki
International Journal of Molecular Sciences | VOL. 16
Monika Szczecińska, et. al.Monika Szczecińska ... Jakub Sawicki
15 Sep 2015
International Journal of Molecular Sciences | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Datataxa: a new script to extract metadata sequence information from GenBank, the Flora of Bajío as a case study

Abstract

Talk to us

Similar Papers

More From: Botanical Sciences