Adaptive performance model for dynamic scaling Apache Spark Streaming

Max Petrov,Nikolay Butakov,Denis Nasonov,Mikhail Melnik

doi:10.1016/j.procs.2018.08.243

Max Petrov, Nikolay Butakov + Show 2 more

Open Access

https://doi.org/10.1016/j.procs.2018.08.243

Copy DOI

Journal: Procedia Computer Science	Publication Date: Jan 1, 2018
Citations: 15	License type: cc-by-nc-nd

Affiliation: ITMO University

Abstract

Nowadays, data volumes increase exceptionally, a lot of information comes from different sources, for example, from mobile phones, sensors, traffic, etc. All information from these sources can be represented as a data streams, which can grow up and fall in time in their size. In the first case, data processing requires optimization via dynamic resource allocation in order to decrease processing time, in the second case, it requires optimization related with resources deallocation because removing unnecessary resources can decrease the total cost. The question is how to identify optimal amount of resources to satisfy required processing delay under certain volume of workload? Current implementation of Apache Spark Streaming and existing models can’t give us such possibility. In this paper, we propose adaptive performance model, which can dynamically scale up and down Apache Spark Streaming platform on the AWS.

Full Text