Abstract

Eukaryotic genomes are the basis for understanding the complexity of life from populations to the molecular level. Recent technological innovations have revolutionized the speed of data generation enabling the sequencing of eukaryotic genomes and transcriptomes within days. The database diArk (http://www.diark.org) has been developed with the aim to provide access to all available assembled genomes and transcriptomes. In September 2014, diArk contains about 2600 eukaryotes with 6000 genome and transcriptome assemblies, of which 22% are not available via NCBI/ENA/DDBJ. Several indicators for the quality of the assemblies are provided to facilitate their comparison for selecting the most appropriate dataset for further studies. diArk has a user-friendly web interface with extensive options for filtering and browsing the sequenced eukaryotes. In this new version of the database we have also integrated species, for which transcriptome assemblies are available, and we provide more analyses of assemblies.

Highlights

  • Eukaryotic genome research enormously benefits from the increasing number of sequenced organisms

  • NCBI/ENA/DDBJ are the central repositories for sequence read archives (SRAs), the ‘raw data’ for generating assemblies, but publishers and funding agencies often do not require assemblies to be stored there

  • Powered by research community efforts, there are excellent databases dedicated to single species such as FlyBase (9), WormBase (10) and dictyBase (11), or repositories for species of certain taxonomic branches such as EuPathDB (12), VectorBase (13) and FungiDB (14)

Read more

Summary

INTRODUCTION

Eukaryotic genome research enormously benefits from the increasing number of sequenced organisms. Powered by research community efforts, there are excellent databases dedicated to single species such as FlyBase (9), WormBase (10) and dictyBase (11), or repositories for species of certain taxonomic branches such as EuPathDB (12), VectorBase (13) and FungiDB (14). Whereas GOLD is mainly focused on microbial genomes, we developed diArk as a central hub for all eukaryotes, for which largescale transcriptome or genome assembly data have been produced and are available to the public. A FastSearch allows filtering for taxa, genome type (genome assemblies, EST data, RNAseq data) and sequencing status (complete and incomplete assemblies), and is currently the most frequently used entry point to diArk. An extended Search form provides six search modules that can be combined in any way to search and filter diArk’s content. Search options and filters for the new data types generated in the last 3 years have been integrated into the existing modules

Result views and statistics
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call