BDAP: A Big Data Placement Strategy for Cloud-Based Scientific Workflows

Mahdi Ebrahimi,Andrey Kashlev,Shiyong Lu,Aravind Mohan

doi:10.1109/bigdataservice.2015.70

Abstract

In this new era of Big Data, there is a growing need to enable scientific workflows to perform computations at a scale far exceeding a single workstation's capabilities. When running such data intensive workflows in the cloud distributed across several physical locations, the execution time and the resource utilization efficiency highly depends on the initial placement and distribution of the input datasets across these multiple virtual machines in the Cloud. In this paper, we propose BDAP (Big DAta Placement strategy), a strategy that improves workflow performance by minimizing data movement across multiple virtual machines. In this work, we 1) formalize the data placement problem in scientific workflows, 2) propose a data placement algorithm that considers both initial input dataset and intermediate datasets obtained during workflow run, and 3) perform extensive experiments in the distributed environment to verify that our proposed strategy provides an effective data placement solution to distribute and place big datasets at the appropriate virtual machines in the Cloud within reasonable time.

Highlights

Workflows have been extensively employed in various scientific areas such as bioinformatics, physics, astronomy, ecology, and earthquake science [10]
They are usually modeled as directed acyclic graphs (DAGs) such that workflow tasks are represented as graph vertices and the data flows among tasks are represented by graph edges
To improve throughput and performance, this type of application can greatly benefit from distributed high performance computing (HPC) infrastructures such as Cloud computing

Summary

Introduction

Workflows have been extensively employed in various scientific areas such as bioinformatics, physics, astronomy, ecology, and earthquake science [10]. They are usually modeled as directed acyclic graphs (DAGs) such that workflow tasks are represented as graph vertices and the data flows among tasks are represented by graph edges. The direction of edges shows data flows among tasks. A scientific workflow management system (SWFMS) is a system to design and execute scientific workflows (SWF). Scientific workflows are potentially very large and comprise hundreds or thousands of complex tasks and big datasets [3, 6]. Moving huge datasets across workflow tasks increases the execution time of scientific workflows. To improve throughput and performance, this type of application can greatly benefit from distributed high performance computing (HPC) infrastructures such as Cloud computing

Objectives

Methods

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

BDAP: A Big Data Placement Strategy for Cloud-Based Scientific Workflows

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Mar 1, 2015
Citations: 44	License type: cc-by

Similar Papers

A Survey on Data Placement Strategies for Cloud based Scientific Workflows
Lalitha Singh ... Jyoti Malhotra
International Journal of Computer Applications | VOL. 141
Lalitha Singh, et. al.Lalitha Singh ... Jyoti Malhotra
17 May 2016
International Journal of Computer Applications | VOL. 141

Distributed architecture design of big data platform
Yiwen Li
-
Yiwen LiYiwen Li
14 Apr 2022
14 Apr 2022

Recent research advances in cloud computing and big data
Fang Dong ... Alisha Malloy
Concurrency and Computation: Practice and Experience | VOL. 27
Fang Dong, et. al.Fang Dong ... Alisha Malloy
04 Sep 2015
Concurrency and Computation: Practice and Experience | VOL. 27

Cache contention aware Virtual Machine placement and migration in cloud datacenters
Liuhua Chen ... Haiying Shen
-
Liuhua Chen, et. al.Liuhua Chen ... Haiying Shen
01 Nov 2016
01 Nov 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

BDAP: A Big Data Placement Strategy for Cloud-Based Scientific Workflows

Abstract

Highlights

Summary

Talk to us

Similar Papers