Data reduction in scientific workflows using provenance monitoring and user steering

Renan Souza,Vítor Silva,Alvaro L.G.A Coutinho,Patrick Valduriez,Marta Mattoso

doi:10.1016/j.future.2017.11.028

Abstract

Scientific workflows need to be iteratively, and often interactively, executed for large input datasets. Reducing data from input datasets is a powerful way to reduce overall execution time in such workflows. When this is accomplished online (i.e., without requiring the user to stop execution to reduce the data, and then resume), it can save much time. However, determining which subsets of the input data should be removed becomes a major problem. A related problem is to guarantee that the workflow system will maintain execution and data consistent with the reduction. Keeping track of how users interact with the workflow is essential for data provenance purposes. In this paper, we adopt the “human-in-the-loop” approach, which enables users to steer the running workflow and reduce subsets from datasets online. We propose an adaptive workflow monitoring approach that combines provenance data monitoring and computational steering to support users in analyzing the evolution of key parameters and determining the subset of data to remove. We extend a provenance data model to keep track of users’ interactions when they reduce data at runtime. In our experimental validation, we develop a test case from the oil and gas domain, using a 936-cores cluster. The results on this test case show that the approach yields reductions of 32% of execution time and 14% of the data processed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Generation Computer Systems	Publication Date: Dec 13, 2017
Citations: 11	License type: other-oa

R Discovery Prime

R Discovery Prime

Data reduction in scientific workflows using provenance monitoring and user steering

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Similar Papers

Cloud Data Management for Scientific Workflows: Research Issues, Methodologies, and State-of-the-Art
Dong Yuan ... Xiao Liu
-
Dong Yuan, et. al.Dong Yuan ... Xiao Liu
01 Aug 2014
01 Aug 2014

A data dependency based strategy for intermediate data storage in scientific cloud workflow systems
Dong Yuan ... Jinjun Chen
Concurrency and Computation: Practice and Experience | VOL. 24
Dong Yuan, et. al.Dong Yuan ... Jinjun Chen
27 Aug 2010
Concurrency and Computation: Practice and Experience | VOL. 24

A balanced scheduler with data reuse and replication for scientific workflows in cloud computing systems
Israel Casas ... Albert Y Zomaya
Future Generation Computer Systems | VOL. 74
Israel Casas, et. al.Israel Casas ... Albert Y Zomaya
06 Jan 2016
Future Generation Computer Systems | VOL. 74

Dynamic steering of HPC scientific workflows: A survey
Marta Mattoso ... Daniel De Oliveira
Future Generation Computer Systems | VOL. 46
Marta Mattoso, et. al.Marta Mattoso ... Daniel De Oliveira
28 Nov 2014
Future Generation Computer Systems | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data reduction in scientific workflows using provenance monitoring and user steering

Abstract

Talk to us

Similar Papers

More From: Future Generation Computer Systems