Adaptive On-the-Fly Changes in Distributed Processing Pipelines.

Toon Albers,Alexander Lazovik,Elena Lazovik,Mostafa Hadadian Nejad Yousefi

doi:10.3389/fdata.2021.666174

Toon Albers, Alexander Lazovik + Show 2 more

Open Access

https://doi.org/10.3389/fdata.2021.666174

Copy DOI

Abstract

Distributed data processing systems have become the standard means for big data analytics. These systems are based on processing pipelines where operations on data are performed in a chain of consecutive steps. Normally, the operations performed by these pipelines are set at design time, and any changes to their functionality require the applications to be restarted. This is not always acceptable, for example, when we cannot afford downtime or when a long-running calculation would lose significant progress. The introduction of variation points to distributed processing pipelines allows for on-the-fly updating of individual analysis steps. In this paper, we extend such basic variation point functionality to provide fully automated reconfiguration of the processing steps within a running pipeline through an automated planner. We have enabled pipeline modeling through constraints. Based on these constraints, we not only ensure that configurations are compatible with type but also verify that expected pipeline functionality is achieved. Furthermore, automating the reconfiguration process simplifies its use, in turn allowing users with less development experience to make changes. The system can automatically generate and validate pipeline configurations that achieve a specified goal, selecting from operation definitions available at planning time. It then automatically integrates these configurations into the running pipeline. We verify the system through the testing of a proof-of-concept implementation. The proof of concept also shows promising results when reconfiguration is performed frequently.

Highlights

Industrial organizations are increasingly dependent on the digital components of their business
In a previous work done by the authors (Lazovik et al, 2017), we have investigated the feasibility of dynamically updating the processing pipeline of an Apache Spark application
From the action variables defined on the transitions in the Constraint Satisfaction Problem (CSP), we can extract the actions assigned on those transitions, which correspond to user code that should be assigned to variation points

Summary

INTRODUCTION

Industrial organizations are increasingly dependent on the digital components of their business. We have developed a framework sparkdynamic (Lazovik et al, 2017), built on top of the popular distributed data processing platform Apache Spark (The Apache Software Foundation, 2015b) to enable the updating of the steps and algorithm parameters of running pipelines without restarting them The main contribution of this paper is a distributed data processing pipeline reconfiguration framework based on constraint-based AI planning It ensures that the current industrial user goals are satisfied, takes into account the dependencies between related steps within the pipeline (and ensuring its data type and structural consistency), and automatically incorporates the new configuration.

Runtime Updating

Updating Distributed Data Processing Pipelines

Spark-Dynamic

Techniques for Building and Checking Pipelines

GENERAL OVERVIEW

PLANNER DESIGN FOR PIPELINE RECONFIGURATION

Core Planning Model

Mapping to the Distributed Pipeline

Planner Representation as CSP

Planning Model Justification

EVALUATION

Plan Generation Time

Dynamic Versus Static

CONCLUSION AND DISCUSSION

DATA AVAILABILITY STATEMENT

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in big data	Publication Date: Nov 26, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Adaptive On-the-Fly Changes in Distributed Processing Pipelines.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in big data

Lead the way for us

Similar Papers

Fdata-04-666174.xml
-
-
--
01 Dec 2021
Fdata-04-666174.xml
-

Fdata-04-666174-g002.tif
-
-
--
01 Dec 2021
Fdata-04-666174-g002.tif
-

Fdata-04-666174-g008.tif
-
-
--
01 Dec 2021
Fdata-04-666174-g008.tif
-

Fdata-04-666174-fx1.tif
-
-
--
01 Dec 2021
Fdata-04-666174-fx1.tif
-

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive On-the-Fly Changes in Distributed Processing Pipelines.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in big data