Dynamic Resource Provisioning for Iterative Workloads on Apache Spark

Dazhao Cheng,Yu Wang,Dong Dai

doi:10.1109/tcc.2021.3108043

Abstract

Apache Spark as a popular in-memory data analytic framework has been employed by various applications---such as machine learning, graph computation, and scientific computing, which benefit from the long-running process (e.g. executor) programming model to avoid system I/O overhead. Since the resource usages of long-running applications like iterative computation vary significantly over time, we find that peak demand based resource allocation policies lead to low cloud utilization in production environments. In this paper, we present a utilization aware resource provisioning approach for iterative workloads on Apache Spark (iSpark). iSpark aims to timely scale up or scale down the number of executors in order to fully utilize the allocated resources while taking the dominant factor into consideration. Testbed evaluations show that iSpark averagely improves the resource utilization of individual executors by 35.2% compared to vanilla Spark. Furthermore, we have extended iSpark to multi-tenancy cloud environments. Specifically, we extend the two-dimensional resource constraints (i.e., CPU and MeM) in iSpark to three-dimensional resource constraints (i.e., CPU, MeM and I/O) to include I/O performance in the cloud environment. Experimental results on virtual clusters with varying interferences show that iSpark with cloud extension improves the average job completion time by 68% compared to the default policy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dynamic Resource Provisioning for Iterative Workloads on Apache Spark

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing

Lead the way for us

Journal: IEEE Transactions on Cloud Computing	Publication Date: Jan 1, 2023
Citations: 7

Similar Papers

Elastic Executor Provisioning for Iterative Workloads on Apache Spark
Donglin Yang ... Dingwen Tao
-
Donglin Yang, et. al.Donglin Yang ... Dingwen Tao
01 Dec 2019
01 Dec 2019

Создание виртуальных кластеров Apache Spark в облачных средах с использованием систем оркестрации
O Borisenko ... R Pastukhov
Proceedings of the Institute for System Programming of the RAS | VOL. 28
O Borisenko, et. al.O Borisenko ... R Pastukhov
01 Jan 2015
Proceedings of the Institute for System Programming of the RAS | VOL. 28

Explaining the Increase in the Australian Average House Completion Time:Activity-based versus Workflow-based Approach
Ehsan Gharaie ... Nick Blismas
Construction Economics and Building | VOL. 10
Ehsan Gharaie, et. al.Ehsan Gharaie ... Nick Blismas
16 Dec 2010
Construction Economics and Building | VOL. 10

The Essential Tools of Scientific Machine Learning (Scientific ML)
Christopher Rackauckas
-
Christopher RackauckasChristopher Rackauckas
20 Aug 2019
20 Aug 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dynamic Resource Provisioning for Iterative Workloads on Apache Spark

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Cloud Computing