Symbolic execution-driven extraction of the parallel execution plans of Spark applications

Luciano Baresi,Giovanni Denaro,Giovanni Quattrocchi

doi:10.1145/3338906.3338973

Luciano Baresi, Giovanni Denaro + Show 1 more

https://doi.org/10.1145/3338906.3338973

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The execution of Spark applications is based on the execution order and parallelism of the different jobs, given data and available resources. Spark reifies these dependencies in a graph that we refer to as the (parallel) execution plan of the application. All the approaches that have studied the estimation of the execution times and the dynamic provisioning of resources for this kind of applications have always assumed that the execution plan is unique, given the computing resources at hand. This assumption is at least simplistic for applications that include conditional branches or loops and limits the precision of the prediction techniques. This paper introduces SEEPEP, a novel technique based on symbolic execution and search-based test generation, that: i) automatically extracts the possible execution plans of a Spark application, along with dedicated launchers with properly synthesized data that can be used for profiling, and ii) tunes the allocation of resources at runtime based on the knowledge of the execution plans for which the path conditions hold. The assessment we carried out shows that SEEPEP can effectively complement dynaSpark, an extension of Spark with dynamic resource provisioning capabilities, to help predict the execution duration and the allocation of resources.

Full Text