MaDaTS

Devarshi Ghoshal,Lavanya Ramakrishnan

doi:10.1145/3078597.3078611

Abstract

Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the tiered storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MaDaTS

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

MaDaTS: Managing Data on Tiered Storage for Scientific Workflows
Devarshi Ghoshal ... Lavanya Ramakrishnan
-
Devarshi Ghoshal, et. al.Devarshi Ghoshal ... Lavanya Ramakrishnan
01 Oct 2018
01 Oct 2018

Security-aware intermediate data placement strategy in scientific cloud workflows
Wei Liu ... Guo Sun Zeng
Knowledge and Information Systems | VOL. 41
Wei Liu, et. al.Wei Liu ... Guo Sun Zeng
03 Jun 2014
Knowledge and Information Systems | VOL. 41

A data dependency based strategy for intermediate data storage in scientific cloud workflow systems
Dong Yuan ... Jinjun Chen
Concurrency and Computation: Practice and Experience | VOL. 24
Dong Yuan, et. al.Dong Yuan ... Jinjun Chen
27 Aug 2010
Concurrency and Computation: Practice and Experience | VOL. 24

A cost-effective strategy for intermediate data storage in scientific cloud workflow systems
Dong Yuan ... Xiao Liu
-
Dong Yuan, et. al.Dong Yuan ... Xiao Liu
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MaDaTS

Abstract

Talk to us

Similar Papers