ASGARD: an open-access database of annotated transcriptomes for emerging model arthropod species

V Zeng,C G Extavour

doi:10.1093/database/bas048

Abstract

The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental biology, physiology, evolutionary biology, ecology, comparative genomics and phylogenomics.Database URL: asgard.rc.fas.harvard.edu

Highlights

In the early ‘genomic era’ of the late 1990s and early 2000s, the genomes of several long-standing traditional laboratory model organisms were completely sequenced [1,2,3,4,5], which galvanized their respective fields by offering enormous amounts of new data for analysis
Assembled Searchable Giant Arthropod Read Database (ASGARD) provides a solution to this problem, allowing users to obtain comprehensive annotation data for each transcriptome assembly product
ASGARD will serve as a repository for the results of RNA-Seq experiments, genome sequencing and other next-generation sequencing (NGS) applications on ASGARD organisms

Summary

Introduction

In the early ‘genomic era’ of the late 1990s and early 2000s, the genomes of several long-standing traditional laboratory model organisms were completely sequenced [1,2,3,4,5], which galvanized their respective fields by offering enormous amounts of new data for analysis. The advent of next-generation sequencing (NGS) has further advanced biological research in traditional model systems, and in an increasing number of clades that previously lacked genomic data [13,14,15,16,17,18,19,20,21,22].

Objectives

Conclusion