Modeling Data Flow Execution in a Parallel Environment

Georgia Kougka,Ulf Leser,Anastasios Gounaris

doi:10.1007/978-3-319-64283-3_14

Abstract

Although the modern data flows are executed in parallel and distributed environments, e.g. on a multi-core machine or on the cloud, current cost models, e.g., those considered by state-of-the-art data flow optimization techniques, do not accurately reflect the response time of real data flow execution in these execution environments. This is mainly due to the fact that the impact of parallelism, and more specifically, the impact of concurrent task execution on the running time is not adequately modeled. In this work, we propose a cost modeling solution that aims to accurately reflect the response time of a data flow that is executed in parallel. We focus on the single multi-core machine environment provided by modern business intelligence tools, such as Pentaho Kettle, but our approach can be extended to massively parallel and distributed settings. The distinctive features of our proposal is that we model both time overlaps and the impact of concurrency on task running times in a combined manner; the latter is appropriately quantified and its significance is exemplified.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Modeling Data Flow Execution in a Parallel Environment

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Optimization of data flow execution in a parallel environment
Georgia Kougka ... Anastasios Gounaris
Distributed and Parallel Databases | VOL. 37
Georgia Kougka, et. al.Georgia Kougka ... Anastasios Gounaris
22 Aug 2018
Distributed and Parallel Databases | VOL. 37

A new approach to performance optimization of mashups via data flow refactoring
Jie Liu ... Dan Ye
-
Jie Liu, et. al.Jie Liu ... Dan Ye
03 Nov 2010
03 Nov 2010

Data flows during public health emergencies in LMICs: A people-centered mapping of data flows during the 2018 ebola epidemic in Equateur, DRC
Sharon Abramowitz ... Karen A Grépin
Social Science & Medicine | VOL. 318
Sharon Abramowitz, et. al.Sharon Abramowitz ... Karen A Grépin
07 Jun 2022
Social Science & Medicine | VOL. 318

Recovering Latent Data Flow from Business Process Model Automatically
Sheng Ye ... Sikandar Ali
Wireless Communications and Mobile Computing | VOL. 2022
Sheng Ye, et. al.Sheng Ye ... Sikandar Ali
20 Jun 2022
Wireless Communications and Mobile Computing | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modeling Data Flow Execution in a Parallel Environment

Abstract

Talk to us

Similar Papers