Adaptive Code Learning for Spark Configuration Tuning

Chen Lin,Jiadong Feng,Xuanhe Zhou,Guoliang Li,Junqing Zhuang,Hui Li

doi:10.1109/icde53745.2022.00195

Abstract

Configuration tuning is vital to optimize the performance of big data analysis platforms like Spark. Existing methods (e.g. auto-tuning relational databases) are not effective for tuning Spark, because the unique characteristics of Spark pose new challenges to configuration tuning. (C1) The Spark applications own various code structures and semantics, and the code features significantly affect Spark performance and configuration selection; (C2) Spark applications are extremely time-consuming on big data. It is infeasible for approaches such as Bayesian Optimization and Reinforcement Learning to collect sufficient training instances or repeatedly execute the applications; (C3) Spark supports various analytical applications and the tuning system needs to adapt to different applications. To address these challenges, we propose a LIghtweighT knob rEcommender system (LITE) for auto-tuning Spark configurations on various analytical applications and large-scale datasets. We first propose a code learning framework that can utilize code features to learn complex correlations between application performance and knob values (addressing C1). We then propose a lightweight auto-tuning method that migrates the knowledge learned from small-scale datasets to large-scale datasets (addressing C2). Next, to generalize to different Spark applications, we propose an adaptive model update approach to fine-tune the model via adversarial learning with newly collected feedback (addressing C3). Extensive experiments showed that LITE achieves much better performance compared with state-of-the-art auto-tuning methods.

Full Text