Fault-tolerance for distributed iterative dataflows in action

Chen Xu,Rudi Poepsel Lemaitre,Juan Soto,Volker Markl

doi:10.14778/3229863.3236242

Abstract

Distributed dataflow systems (DDS) are widely employed in graph processing and machine learning (ML), where many of these algorithms are iterative in nature. Typically, DDS achieve fault-tolerance using checkpointing mechanisms or they exploit algorithmic properties to enable fault-tolerance without the need for checkpoints. Recently, for graph processing, we proposed utilizing unblocking checkpointing , to parallelize the execution pipeline and checkpoint writing, as well as confined recovery , to enable fast recovery upon partial node failures. Furthermore, for ML algorithms implemented using broadcast variables, we proposed utilizing replica recovery , to leverage broadcast variable replicas and facilitate failure recovery checkpointing-free. In this demonstration, we showcase these fault-tolerance techniques using Apache Flink. Attendees will be able to: (i) run representative iterative algorithms including PageRank, Connected Components, and K-Means, (ii) explore the internal behavior of DDS under the influence of unblocking checkpointing, and (iii) trigger failures, to observe the effects of confined recovery and replica recovery.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fault-tolerance for distributed iterative dataflows in action

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Journal: Proceedings of the VLDB Endowment	Publication Date: Aug 1, 2018
Citations: 1

Similar Papers

Efficient fault-tolerance for iterative graph processing on distributed dataflow systems
Chen Xu ... Volker Markl
-
Chen Xu, et. al.Chen Xu ... Volker Markl
01 May 2016
01 May 2016

On Fault Tolerance for Distributed Iterative Dataflow Processing
Chen Xu ... Volker Markl
IEEE Transactions on Knowledge and Data Engineering | VOL. 29
Chen Xu, et. al.Chen Xu ... Volker Markl
01 Aug 2017
IEEE Transactions on Knowledge and Data Engineering | VOL. 29

Benchmarking Data Flow Systems for Scalable Machine Learning
Christoph Boden ... Tilmann Rabl
-
Christoph Boden, et. al.Christoph Boden ... Tilmann Rabl
14 May 2017
14 May 2017

Machine and deep learning algorithms for classifying different types of dementia: A literature review
Masoud Noroozi ... Niloofar Deravi
Applied Neuropsychology: Adult | VOL. ahead-of-print
Masoud Noroozi, et. al.Masoud Noroozi ... Niloofar Deravi
31 Jul 2024
Applied Neuropsychology: Adult | VOL. ahead-of-print

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fault-tolerance for distributed iterative dataflows in action

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment