Abstract
The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database URL: http://www.ncbi.nlm.nih.gov/projects/physalis/
Highlights
Next-generation DNA sequencing technologies [1] have substantially improved in recent years
Large central resources such as those at the National Center for Biotechnology Information (NCBI), European Bioinformatics Institute (EBI), and Kyoto Encyclopedia of Genes and Genomes (KEGG) invest significant effort to integrate other databases and datasets to provide a high-level service for the global research community [12]
Since 2010, NCBI has added a new division for GenBank [13], named Transcriptome Shotgun Assembly Sequence Database (TSA, https://www.ncbi.nlm.nih.gov/genbank/ tsa/), which contains shotgun assemblies of sequences deposited in the NCBI Trace Archive (TA) and the Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra)
Summary
Next-generation DNA sequencing technologies [1] have substantially improved in recent years. In 2013, Jones and Blaxter published afterParty [24], a web-based application for creating, searching, browsing, and visualizing transcriptome data This application provides an annotation workflow for new sequences that performs BLAST searches against the UniProt database [25] for annotation and uses the InterProScan tool [26] to identify protein domains and regions of interest. Both individual and global annotations include KOG [29], eggNOG [30], and KEGG Pathway [31] resources in addition to references to GO terms This application provides a well-defined workflow and web interface for their data, the source code is not available, which limits its use in other projects. We present a case study for Physalis peruviana (NCBI Taxonomy ID: 126903) using data from BioProject ID 67621 (published at https://www.ncbi.nlm.nih.gov/projects/physalis/)
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have