Modeling Big Data Processing Programs

João Batista De Souza Neto,Genoveva Vargas-Solar,Anamaria Martins Moreira,Martin A Musicante

doi:10.1007/978-3-030-63882-5_7

Abstract

We propose a new model for data processing programs. Our model generalizes the data flow programming style implemented by systems such as Apache Spark, DryadLINQ, Apache Beam and Apache Flink. The model uses directed acyclic graphs (DAGs) to represent the main aspects of data flow-based systems, namely, operations over data (filtering, aggregation, join) and program execution, defined by data dependence between operations. We use Monoid Algebra to model operations over distributed, partitioned datasets and Petri Nets to represent the data flow. This approach allows the data processing program specification to be agnostic of the target Big Data processing system. As a first application of the model, we used it to formalize mutation operators for the application of mutation testing in Big Data processing programs. The testing tool TRANSMUT-Spark implement these operators.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Modeling Big Data Processing Programs

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 3	License type: other-oa

Similar Papers

A two-level formal model for Big Data processing programs
João Batista De Souza Neto ... Genoveva Vargas-Solar
Science of Computer Programming | VOL. 215
João Batista De Souza Neto, et. al.João Batista De Souza Neto ... Genoveva Vargas-Solar
01 Mar 2022
Science of Computer Programming | VOL. 215

WITHDRAWN: Comparative Research on Active Learning of Big Aata based on Mapreduce and Spark
Zhang Ruihong ... Hu Zhihua
Microprocessors and Microsystems | VOL. -
Zhang Ruihong, et. al.Zhang Ruihong ... Hu Zhihua
01 Nov 2020
Microprocessors and Microsystems | VOL. -

Оценка фактического объема воды в водоеме посредством надводного беспилотного аппарата
Nikolai O Naumenko ... Margarita A Shiryaeva
Land Reclamation and Hydraulic Engineering | VOL. 14
Nikolai O Naumenko, et. al.Nikolai O Naumenko ... Margarita A Shiryaeva
01 Jan 2024
Land Reclamation and Hydraulic Engineering | VOL. 14

Intermediate Data Caching Optimization for Multi-Stage and Parallel Big Data Frameworks
Zhengyu Yang ... Ningfang Mi
-
Zhengyu Yang, et. al.Zhengyu Yang ... Ningfang Mi
01 Jul 2018
01 Jul 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modeling Big Data Processing Programs

Abstract

Talk to us

Similar Papers