Abstract

BackgroundOver the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Two popular workflow systems that have gained acceptance by the bioinformatics community are Taverna and Galaxy. Each system has a large user-base and supports an ever-growing repository of application workflows. However, workflows developed for one system cannot be imported and executed easily on the other. The lack of interoperability is due to differences in the models of computation, workflow languages, and architectures of both systems. This lack of interoperability limits sharing of workflows between the user communities and leads to duplication of development efforts.ResultsIn this paper, we present Tavaxy, a stand-alone system for creating and executing workflows based on using an extensible set of re-usable workflow patterns. Tavaxy offers a set of new features that simplify and enhance the development of sequence analysis applications: It allows the integration of existing Taverna and Galaxy workflows in a single environment, and supports the use of cloud computing capabilities. The integration of existing Taverna and Galaxy workflows is supported seamlessly at both run-time and design-time levels, based on the concepts of hierarchical workflows and workflow patterns. The use of cloud computing in Tavaxy is flexible, where the users can either instantiate the whole system on the cloud, or delegate the execution of certain sub-workflows to the cloud infrastructure.ConclusionsTavaxy reduces the workflow development cycle by introducing the use of workflow patterns to simplify workflow creation. It enables the re-use and integration of existing (sub-) workflows from Taverna and Galaxy, and allows the creation of hybrid workflows. Its additional features exploit recent advances in high performance cloud computing to cope with the increasing data size and complexity of analysis.The system can be accessed either through a cloud-enabled web-interface or downloaded and installed to run within the user's local environment. All resources related to Tavaxy are available at http://www.tavaxy.org.

Highlights

  • Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications

  • The packages currently include about 160 open source tools, coming from EMBOSS [35], SAMtools [36], fastx [37], NCBI BLAST Toolkit [38,39,40], and other individual sequence analysis programs

  • In the second case study, we demonstrate 1) the use of Tavaxy for a metagenomics workflow based on Next Generation Sequencing (NGS) data; 2) the advantages of using advanced data patterns in facilitating the workflow design and supporting parallel execution; 3) the speed-up achieved by using local HPC infrastructure; and 4) the efficient and cost-saving use of cloud computing

Read more

Summary

Introduction

Over the past decade the workflow system paradigm has evolved as an efficient and user-friendly approach for developing complex bioinformatics applications. Increasing complexity of analysis and scientific workflow paradigm The advent of high-throughput sequencing technologies - accompanied with the recent advances in open source software tools, open access data sources, and cloud computing platforms - has enabled the genomics community to develop and use sophisticated application workflows. Such workflows start with voluminous raw sequences and end with detailed structural, functional, and evolutionary results. To simplify the design and execution of complex bioinformatics workflows, especially those that use multiple software tools and data resources, a number of scientific workflow systems have been developed over the past decade. Examples include Taverna [8,9], Kepler [10], Triana [11,12], Galaxy [13], Conveyor [14] Pegasus [15], Pegasys [16], Gene Pattern [17,18], Discovery Net [19,20], and OMII-BPEL [21]; see [22] for a survey and comparison of some of these tools

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call