EHive: An Artificial Intelligence workflow system for genomic analysis

Jessica Severin,Javier Herrero,Michael Schuster,Kathryn Beal,Abel Ureta-Vidal,Leo Gordon,Paul Flicek,Albert J Vilella,Stephen Fitzgerald

doi:10.1186/1471-2105-11-240

Abstract

BackgroundThe Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future.ResultsWe present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios.ConclusionseHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.

Highlights

The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year
Many of these systems have a latency of several seconds between the job submission and its execution and most are designed around the idea that jobs will run for an hour or more. They are not designed for handling 100 million jobs that run for only a few seconds each. To manage this increased job queuing overhead, applications with large numbers of short jobs often require another system on top of the job scheduler to "batch" jobs so that they can match the parameters of the job scheduler
Here we describe the eHive system for large-scale genomic analysis

Summary

Introduction

The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. The Ensembl project provides an integrated system for the annotation of chordate genomes and the management of genome information [1]. Data updates are provided for recently sequenced species, for those species with new assemblies and when additional information is available. The data is provided through the Ensembl Genome Browser (http://www.ensembl.org), a Perl API, via direct querying of the underlying databases or via Biomart, a data-mining tool [2]. The same public Perl API is used by both the web server to fetch the data from the database and the project members themselves for accessing data, analysis and storing the results of the analyses

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: May 11, 2010
Citations: 55	License type: cc-by

R Discovery Prime

R Discovery Prime

EHive: An Artificial Intelligence workflow system for genomic analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Understanding and modeling human traits and diseases: Insights from the comparative genomics resources of Zoonomia
Maosen Ye ... Deng-Feng Zhang
The Innovation | VOL. 4
Maosen Ye, et. al.Maosen Ye ... Deng-Feng Zhang
20 May 2023
The Innovation | VOL. 4

Comparative Genomics—An Application for Positional Cloning of the weissherbst Mutant
Anhua Song ... Yi Zhou
Methods in Cell Biology | VOL. 77
Anhua Song, et. al.Anhua Song ... Yi Zhou
01 Jan 2004
Methods in Cell Biology | VOL. 77

Genome Alignment, Evolution of Prokaryotic Genome Organization, and Prediction of Gene Function Using Genomic Context
Yuri I Wolf ... Igor B Rogozin
Genome Research | VOL. 11
Yuri I Wolf, et. al.Yuri I Wolf ... Igor B Rogozin
08 Feb 2001
Genome Research | VOL. 11

Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context.
Y I Wolf
Genome Research | VOL. 11
Y I WolfY I Wolf
08 Feb 2001
Genome Research | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

EHive: An Artificial Intelligence workflow system for genomic analysis

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics