Abstract

BackgroundThe submission of DNA sequences to public sequence databases is an essential, but insufficiently automated step in the process of generating and disseminating novel DNA sequence data. Despite the centrality of database submissions to biological research, the range of available software tools that facilitate the preparation of sequence data for database submissions is low, especially for sequences generated via plant and fungal DNA barcoding. Current submission procedures can be complex and prohibitively time expensive for any but a small number of input sequences. A user-friendly software tool is needed that streamlines the file preparation for database submissions of DNA sequences that are commonly generated in plant and fungal DNA barcoding.MethodsA Python package was developed that converts DNA sequences from the common EMBL and GenBank flat file formats to submission-ready, tab-delimited spreadsheets (so-called ‘checklists’) for a subsequent upload to the annotated sequence section of the European Nucleotide Archive (ENA). The software tool, titled ‘EMBL2checklists’, automatically converts DNA sequences, their annotation features, and associated metadata into the idiosyncratic format of marker-specific ENA checklists and, thus, generates files that can be uploaded via the interactive Webin submission system of ENA.ResultsEMBL2checklists provides a simple, platform-independent tool that automates the conversion of common DNA barcoding sequences into easily editable spreadsheets that require no further processing but their upload to ENA via the interactive Webin submission system. The software is equipped with an intuitive graphical as well as an efficient command-line interface for its operation. The utility of the software is illustrated by its application in four recent investigations, including plant phylogenetic and fungal metagenomic studies.DiscussionEMBL2checklists bridges the gap between common software suites for DNA sequence assembly and annotation and the interactive data submission process of ENA. It represents an easy-to-use solution for plant and fungal biologists without bioinformatics expertise to generate submission-ready checklists from common DNA sequence data. It allows the post-processing of checklists as well as work-sharing during the submission process and solves a critical bottleneck in the effort to increase participation in public data sharing.

Highlights

  • A few software tools assist in the preparation of DNA sequence data for submission to public sequence databases, despite the centrality of this process for disseminating novel biological data

  • We report about the development and application of a Python package, entitled ‘EMBL2checklists’, that takes annotated DNA sequences of common plant and fungal DNA barcoding regions and associated metadata as input and returns properly-formatted checklists that are ready for data upload to European Nucleotide Archive (ENA) via the interactive Webin submission system

  • The utility of EMBL2checklists to plant and fungal biology is illustrated by its application in the submission process of DNA sequences to ENA by four recent investigations

Read more

Summary

Introduction

A few software tools assist in the preparation of DNA sequence data for submission to public sequence databases, despite the centrality of this process for disseminating novel biological data. ENA, for example, channels the submission of annotated DNA sequences through the Webin submission framework (https:// www.ebi.ac.uk/ena/submit/sra/; [15]), which, in its interactive version, operates with pre-formatted, tab-delimited spreadsheets. These spreadsheets ( called ‘annotation checklists’ or ‘templates’) are filled out by the user and uploaded for submission. Despite the centrality of database submissions to biological research, the range of available software tools that facilitate the preparation of sequence data for database submissions is low, especially for sequences generated via plant and fungal DNA barcoding. A user-friendly software tool is needed that streamlines the file preparation for database submissions of DNA sequences that are commonly generated in plant and fungal DNA barcoding

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call