Abstract
As the generation and use of genomic datasets is becoming increasingly common in all areas of biology, the need for resources to collate, analyse and present data from one or more genome projects is becoming more pressing. The Ensembl platform is a powerful tool to make genome data and cross-species analyses easily accessible through a web interface and a comprehensive application programming interface. Here we introduce GenomeHubs, which provide a containerized environment to facilitate the setup and hosting of custom Ensembl genome browsers. This simplifies mirroring of existing content and import of new genomic data into the Ensembl database schema. GenomeHubs also provide a set of analysis containers to decorate imported genomes with results of standard analyses and functional annotations and support export to flat files, including EMBL format for submission of assemblies and annotations to International Nucleotide Sequence Database Collaboration. Database URL: http://GenomeHubs.org
Highlights
Access to genomic sequence data for target species underpins a large component of research programmes across all areas of biology and ecology
GenomeHubs facilitate the import of diverse data into Ensembl databases, making it relatively straightforward to set up and host Ensembl sites for any set of assemblies for which sequence data and gene models are available
Using the containerized approach described here, any bioinformatician comfortable with the command line will be able to set up a new GenomeHubs site within a few hours even if they have never encountered the Ensembl infrastructure before, and will have access to the full suite of Ensembl web-based displays and comprehensive application programming interface (API) for manipulating genomic data
Summary
Access to genomic sequence data for target species underpins a large component of research programmes across all areas of biology and ecology. This has fuelled rapid increases both in the number of genome-sequencing projects, and in the number of research groups generating, assembling and annotating draft genome sequences. The results of these analyses are published, it can be hard to access the underlying data in consistent formats. Some researchers make these data available on their individual lab websites as downloadable files, but others make them more accessible by providing a genome browser, BLAST sequence search interface, and a VC The Author(s) 2017.
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have