TurBO: A cost-efficient configuration-based auto-tuning approach for cluster-based big data frameworks

Hui Dou,Lei Zhang,Yiwen Zhang,Pengfei Chen,Zibin Zheng

doi:10.1016/j.jpdc.2023.03.002

Abstract

Big data processing frameworks such as Spark usually provide a large number of performance-related configuration parameters, how to auto-tune these parameters for a better performance has been a hot issue in academia as well as industry for years. Through delicately tradeoff between exploration and exploitation, Bayesian Optimization (BO) is currently the most appealing algorithm to achieve configuration auto-tuning. However, considering the tuning cost constraint in practice, there are three critical limitations preventing conventional BO-based approaches from being directly applied into auto-tuning cluster-based big data frameworks. In this paper, we propose a cost-efficient configuration auto-tuning approach named TurBO for big data frameworks based on two enhancements of vanilla BO:1) To reduce the essential iteration times, TurBO integrates a well-designed adaptive pseudo point mechanism with BO; 2) To avoid the time-consuming practical evaluation of sub-optimal configurations as possible, TurBO leverages the proposed CASampling method to intelligently tackle with these sub-optimal configurations based on ensemble learning with historical tuning experiences. To evaluate the performance of TurBO, we conducted a series of experiments on a local Spark cluster with 9 different HiBench benchmark applications. Overall, compared with 3 representative BO-based baseline approaches OpenTuner, Bliss and ResTune, TurBO is able to speedup the tuning procedures respectively by 2.24×, 2.29× and 1.97× on average. Besides, TurBO can always achieve a positive cumulative performance gain under the simulated dynamic workload scenario, which means TurBO is indeed appropriate for workload changes of big data applications.

Full Text