Optimization of data flow execution in a parallel environment

Georgia Kougka,Anastasios Gounaris

doi:10.1007/s10619-018-7243-3

Abstract

Although the modern data flows are executed in parallel and distributed environments, e.g. on a multi-core machine or on the cloud, current cost models, e.g., those considered by state-of-the-art data flow optimization techniques, do not accurately reflect the response time of real data flow execution in these execution environments. This is mainly due to the fact that the impact of parallelism, and more specifically, the impact of concurrent task execution on the running time is not adequately modeled in current cost models. The contribution of this work is twofold. Firstly, we propose an advanced cost model that aims to reflect the response time of a data flow that is executed in parallel more accurately. Secondly, we show that existing optimization solutions are inadequate and develop new optimization techniques targeting the proposed cost model. We focus on the single multi-core machine environment provided by modern business intelligence tools, such as Pentaho Kettle, but our approach can be extended to massively parallel and distributed settings. The distinctive features of our proposal is that we model both time overlaps and the impact of concurrency on task running times in a combined manner; the latter is appropriately quantified and its significance is exemplified. Furthermore, we propose extensions to current optimizers that decide on the exact ordering of flow tasks taking into account the new optimization metric. Finally, we evaluate the new optimization algorithms and show up to 59% response time improvement over state-of-the-art task ordering techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimization of data flow execution in a parallel environment

Abstract

Talk to us

Similar Papers

More From: Distributed and Parallel Databases

Lead the way for us

Journal: Distributed and Parallel Databases	Publication Date: Aug 22, 2018
Citations: 9

Similar Papers

Modeling Data Flow Execution in a Parallel Environment
Georgia Kougka ... Anastasios Gounaris
-
Georgia Kougka, et. al.Georgia Kougka ... Anastasios Gounaris
01 Jan 2017
01 Jan 2017

A Comparison of the Predictive Ability of Historical Cost and Current Cost Accounting With Regard to the Prediction of Operating Cash Flow.
Ralph Welton
-
Ralph WeltonRalph Welton
01 Jan 1981
01 Jan 1981

Data flows during public health emergencies in LMICs: A people-centered mapping of data flows during the 2018 ebola epidemic in Equateur, DRC
Sharon Abramowitz ... Karen A Grépin
Social Science & Medicine | VOL. 318
Sharon Abramowitz, et. al.Sharon Abramowitz ... Karen A Grépin
07 Jun 2022
Social Science & Medicine | VOL. 318

A new approach to performance optimization of mashups via data flow refactoring
Jie Liu ... Dan Ye
-
Jie Liu, et. al.Jie Liu ... Dan Ye
03 Nov 2010
03 Nov 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimization of data flow execution in a parallel environment

Abstract

Talk to us

Similar Papers

More From: Distributed and Parallel Databases