Abstract

When in need for executing complex sets of interrelated calculations on High-Performance Computing (HPC) environments the obvious choice is to use scientific workflows. As workload management software do not support the execution of interrelated tasks, workflow management systems have been introduced to execute workflows on HPC environments. Recently, a new distributed architectural model that offers dynamic workflow execution capabilities to workflow management systems is introduced. It executes workflows on a per-task basis. While this approach facilitates dynamic workflows, it adds a considerable overhead to workflows substantially increasing their makespans. As most workflows are static, task-wise execution of workflows degrades the performance of most workflows. In this paper, we introduce a distributed workflow management system, SwarmForm that introduces task clustering to the new architectural model. SwarmForm is open source and offers better performance than existing distributed workflow management systems by clustering workflow tasks to reduce overheads while allowing the users to choose between task-wise and cluster-wise execution of workflows depending on the workflow nature. The paper proves that SwarmForm enables the use of all the features introduced with the new architectural model while providing better makespans for scientific workflows.

Highlights

  • Almost every scientific domain such as Astrophysics, Bio and health informatics, Physics, and Bio-Sciences use workflows to express complex sets of tasks that are dependent on one another using Scientific workflows

  • We introduce a new distributed workflow management system SwarmForm which includes task clustering to reduce the makespan of workflows

  • Since in most of the High-Performance Computing (HPC) environments we cannot get the exact number of resources available at the time of execution, we have proposed a slight modification to the Workflow and Platform Aware task clustering (WPA) algorithm along with the addition of our vertical clustering approach

Read more

Summary

INTRODUCTION

This paper is an extended version of the paper “SwarmForm: A Distributed Workflow Management System with Task Clustering” presented at the ICTer 2020 conference. SLURM [2], TORQUE [3] are installed on these HPC environments to manage the computing resources of the environment They do not support workflow scheduling but only support the execution of independent jobs. Running a workflow as a pilot job results in better makespan with poor resource utilization of the execution environment whereas running a workflow as chained jobs results in better resource utilization with poor makespan. Distributed WMSs execute workflows as chained jobs with a separate job for each task whereas centralized WMSs execute workflows as pilot jobs.

RELATED WORK
SwarmForm Workflow Management System
RAC Algorithm
RESULTS
DISCUSSIONS
FUTURE WORK
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call