Spinning fast iterative data flows

Stephan Ewen,Volker Markl,Moritz Kaufmann,Kostas Tzoumas

doi:10.14778/2350229.2350245

Abstract

Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk iterative algorithms are supported by novel dataflow frameworks, these systems cannot exploit computational dependencies present in many algorithms, such as graph algorithms. As a result, these algorithms are inefficiently executed and have led to specialized systems based on other paradigms, such as message passing or shared memory. We propose a method to integrate incremental iterations , a form of workset iterations, with parallel dataflows. After showing how to integrate bulk iterations into a dataflow system and its optimizer, we present an extension to the programming model for incremental iterations. The extension alleviates for the lack of mutable state in dataflows and allows for exploiting the sparse computational dependencies inherent in many iterative algorithms. The evaluation of a prototypical implementation shows that those aspects lead to up to two orders of magnitude speedup in algorithm runtime, when exploited. In our experiments, the improved dataflow system is highly competitive with specialized systems while maintaining a transparent and unified dataflow abstraction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spinning fast iterative data flows

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Jul 1, 2012
Citations: 138

Similar Papers

Scalable Analysis of Massive Graphs on a Parallel Data Flow System
Andy Yoo
-
Andy YooAndy Yoo
01 Jan 2009
01 Jan 2009

DataFlow Systems: From Their Origins to Future Applications in Data Analytics, Deep Learning, and the Internet of Things
Veljko Milutinovic ... Nemanja Trifunovic
-
Veljko Milutinovic, et. al.Veljko Milutinovic ... Nemanja Trifunovic
01 Jan 2017
01 Jan 2017

M2r2: A Framework for Results Materialization and Reuse in High-Level Dataflow Systems for Big Data
Vasiliki Kalavri ... Vladimir Vlassov
-
Vasiliki Kalavri, et. al.Vasiliki Kalavri ... Vladimir Vlassov
01 Dec 2013
01 Dec 2013

Evaluating use of data flow systems for large graph analysis
Andy Yoo ... Ian Kaplan
-
Andy Yoo, et. al.Andy Yoo ... Ian Kaplan
16 Nov 2009
16 Nov 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spinning fast iterative data flows

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment