Abstract
Spark is a big data processing framework based on MapReduce, whose calculation model requires that all tasks in all parent stages are completed before starting a new stage. Machine service variability or congested network connections caused by partial or intermittent machine failures become a bottleneck for the Spark framework to execute tasks. In this paper, we focus on the design of speculative execution schemes for heterogeneous Spark from an optimization perspective on different loading conditions. First, we derive the load arrival rate threshold for different operating regimes. Second, for the lightly loaded case, we analyze and propose the speculative execution based on task-cloning algorithm (SETC) which reduce the application completion time by maximizing the overall system utility. Then, for the heavily loaded case, we propose the speculative execution based on straggler-detection algorithm(SESD), which aims to mitigate stragglers. Finally, we conduct experiments to verify the performance of SETC and SESD. Results show that our method is faster than Spark-Speculation, LATE, and SCA by16.7%, 8.2%, and 11.7%. Also it outperforms the baseline algorithms in some metric aspect such as the cluster throughput.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.