Optimized Speculative Execution Strategy for Different Workload Levels in Heterogeneous Spark Cluster

Xiaohan Huang,Chunlin Li,Youlong Luo

doi:10.1145/3335484.3335493

Abstract

Spark is a big data processing framework based on MapReduce, whose calculation model requires that all tasks in all parent stages are completed before starting a new stage. Machine service variability or congested network connections caused by partial or intermittent machine failures become a bottleneck for the Spark framework to execute tasks. In this paper, we focus on the design of speculative execution schemes for heterogeneous Spark from an optimization perspective on different loading conditions. First, we derive the load arrival rate threshold for different operating regimes. Second, for the lightly loaded case, we analyze and propose the speculative execution based on task-cloning algorithm (SETC) which reduce the application completion time by maximizing the overall system utility. Then, for the heavily loaded case, we propose the speculative execution based on straggler-detection algorithm(SESD), which aims to mitigate stragglers. Finally, we conduct experiments to verify the performance of SETC and SESD. Results show that our method is faster than Spark-Speculation, LATE, and SCA by16.7%, 8.2%, and 11.7%. Also it outperforms the baseline algorithms in some metric aspect such as the cluster throughput.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimized Speculative Execution Strategy for Different Workload Levels in Heterogeneous Spark Cluster

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Optimization for Speculative Execution in Big Data Processing Clusters
Huanle Xu ... Wing Cheong Lau
IEEE Transactions on Parallel and Distributed Systems | VOL. 28
Huanle Xu, et. al.Huanle Xu ... Wing Cheong Lau
01 Jan 2015
IEEE Transactions on Parallel and Distributed Systems | VOL. 28

Cluster Performance by Dynamic Load and Resource-Aware Speculative Execution
Juby Mathew
-
Juby MathewJuby Mathew
01 Jan 2020
01 Jan 2020

Improved Hadoop Cluster Performance by Dynamic Load and Resource Aware Speculative Execution and Straggler Node Detection
Juby Mathew ... Terry Jacob Mathew
International Journal of Engineering and Advanced Technology | VOL. 9
Juby Mathew, et. al.Juby Mathew ... Terry Jacob Mathew
30 Apr 2020
International Journal of Engineering and Advanced Technology | VOL. 9

Cheetah: A Dynamic Performance Optimization Approach on Heterogeneous Big Data Analytics Cluster
Haizhou Du ... Ping Han
-
Haizhou Du, et. al.Haizhou Du ... Ping Han
01 Aug 2019
01 Aug 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimized Speculative Execution Strategy for Different Workload Levels in Heterogeneous Spark Cluster

Abstract

Talk to us

Similar Papers