Abstract

Grid brings the power of many computers to scientists. However, the development of Grid-enabled applications requires knowledge about Grid infrastructure and low-level API to Grid services. In turn, workflow management systems provide a high-level environment for rapid prototyping of experimental computing systems. Coupling Grid and workflow paradigms is important for the scientific community: it makes the power of the Grid easily available to the end user. The paradigm of data driven workflow execution is one of the ways to enable distributed workflow on the Grid. The work presented in this paper is carried out in the context of the Virtual Laboratory for e-Science project. We present the VLAM-G workflow management system and its core component: the Run-Time System (RTS). The RTS is a dataflow driven workflow engine which utilizes Grid resources, hiding the complexity of the Grid from a scientist. Special attention is paid to the concept of dataflow and direct data streaming between distributed workflow components. We present the architecture and components of the RTS, describe the features of VLAM-G workflow execution, and evaluate the system by performance measurements and a real life use case.

Highlights

  • Grids have emerged as a global cyber-infrastructure for the next-generation e-Science applications

  • The core features of the system we present are: (1) Dataflow is used as a driving force; (2) Workflow components (VLAM-G modules) are versatile: a module may be either a specially developed software component, or an interface to legacy applications or web-services; (3) Distributed execution: support for Grid job submissions together with web serivces and local tasks within a single workflow; (4) Support for legacy applications wrapped as modules; (5) Support for remote graphical output: remote X display for Grid jobs is provided; (6) Interactivity support: online control via parameters, and via remote graphical output; (7) Decentralized handling of intermediate data; (8) Decoupling of GUI and engine; 3.2

  • In this paper we present VLAM-G – a data-driven workflow management system, and the Run-Time System, its engine

Read more

Summary

Introduction

Grids have emerged as a global cyber-infrastructure for the next-generation e-Science applications. V. Korkhov et al / VLAM-G: Interactive data driven workflow engine for Grid-enabled resources such as finding the appropriate available resources and manipulating input, output, and intermediate data sets to a system they trust. We present a dataflow driven workflow engine, which uses the basic services of the Grid to allow data streams to be established efficiently and transparently between remote processes composing a scientific workflow. This workflow engine consists of a Run-Time Environment (libvlport library) for workflow components and a Run-Time System Manager (RTSM) used to control and orchestrate the execution of the entire workflow. We conclude the paper with a discussion of future work in the context of the on-going VL-e project

Related work
The vision
Schedule
Design and implementation
Connecting modules
Performance evaluation
A bioinformatics use case
Conclusions and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call