Abstract

BackgroundNowadays, the sequencing of even the largest mammalian genomes has become a question of days with current next-generation sequencing methods. It comes as no surprise that dozens of genome assemblies are released per months now. Since the number of next-generation sequencing machines increases worldwide and new major sequencing plans are announced, a further increase in the speed of releasing genome assemblies is expected. Thus it becomes increasingly important to get an overview as well as detailed information about available sequenced genomes. The different sequencing and assembly methods have specific characteristics that need to be known to evaluate the various genome assemblies before performing subsequent analyses.ResultsdiArk has been developed to provide fast and easy access to all sequenced eukaryotic genomes worldwide. Currently, diArk 2.0 contains information about more than 880 species and more than 2350 genome assembly files. Many meta-data like sequencing and read-assembly methods, sequencing coverage, GC-content, extended lists of alternatively used scientific names and common species names, and various kinds of statistics are provided. To intuitively approach the data the web interface makes extensive usage of modern web techniques. A number of search modules and result views facilitate finding and judging the data of interest. Subscribing to the RSS feed is the easiest way to stay up-to-date with the latest genome data.ConclusionsdiArk 2.0 is the most up-to-date database of sequenced eukaryotic genomes compared to databases like GOLD, NCBI Genome, NHGRI, and ISC. It is different in that only those projects are stored for which genome assembly data or considerable amounts of cDNA data are available. Projects in planning stage or in the process of being sequenced are not included. The user can easily search through the provided data and directly access the genome assembly files of the sequenced genome of interest. diArk 2.0 is available at http://www.diark.org.

Highlights

  • Nowadays, the sequencing of even the largest mammalian genomes has become a question of days with current next-generation sequencing methods

  • First draft versions often contain sets of so-called contigs that have been built from the assembly of whole genome shotgun reads

  • The publication of the genome sequence of an organism does not correlate with the status of the assembly process

Read more

Summary

Introduction

The sequencing of even the largest mammalian genomes has become a question of days with current next-generation sequencing methods. While Celera, using the same Sanger technique, already accelerated human genome sequencing to three years by applying a whole genome shotgun instead of the primer based approach [2], the sequencing of even the largest mammalian genomes has become only a matter of days with current next-generation sequencing methods [3]. It is obvious that analyses based on genes, genomic regions, or proteins need high coverage genome sequences and assemblies to very long contigs or even supercontigs. This is especially true for the analysis of genes of higher eukaryotes that are often spread over hundred thousands of base pairs

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call