Abstract

The development of grid and workflow technologies has enabled complex, loosely coupled scientific applications to be executed on distributed resources. Many of these applications consist of large numbers of short-duration tasks whose runtimes are heavily influenced by delays in the execution environment. Such applications often perform poorly on the grid because of the large scheduling overheads commonly found in grids. In this paper we present a provisioning system based on multi-level scheduling that improves workflow runtime by reducing scheduling overheads. The system reserves resources for the exclusive use of the application, and gives applications control over scheduling policies. We describe our experiences with the system when running a suite of real workflow-based applications including in astronomy, earthquake science, and genomics. Provisioning resources with Corral ahead of the workflow execution, reduced the runtime of the astronomy application by up to 78% (45% on average) and of a genome mapping application by an order of magnitude when compared to traditional methods. We also show how provisioning can benefit applications both on a small local cluster as well as a large-scale campus resource.

Highlights

  • Workflow systems have been used to manage largescale, loosely-coupled scientific computations in a wide variety of domains including physics [10], earth science [2,9], and astronomy [18]

  • We performed the evaluation of our approach in four ways: (1) We measured the overhead of starting up pilot jobs to provision resources, breaking into the phases of setting the software at the site, starting the jobs, and performing cleanup of the site

  • The provisioning phase is composed of two sub-phases, allocation and runtime, that correspond to the scheduling/queuing delay and the execution time of the glidein job

Read more

Summary

Introduction

Workflow systems have been used to manage largescale, loosely-coupled scientific computations in a wide variety of domains including physics [10], earth science [2,9], and astronomy [18] These applications often consist of large numbers of compute- or dataintensive tasks with complex control and data flow dependencies. Clouds are being investigated as a computing platform for science applications, the majority still run on campus clusters or grids Most of these highperformance computing systems provide an execution model based on batch scheduling where jobs are held in a queue until they can be matched with resources for execution. For workflows and other high-throughput applications with large numbers of tasks these overheads have a detrimental impact on performance because delays are accumulated many times as the application executes

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call