A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization.

Xu Huang,Hong Zhang,Xiaomeng Zhai

doi:10.3390/s22155930

Abstract

Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Aug 8, 2022
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization.

Abstract

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Diaspore: Diagnosing Performance Interference in Apache Spark
Sarah Shah ... Yasaman Amannejad
IEEE Access | VOL. 9
Sarah Shah, et. al.Sarah Shah ... Yasaman Amannejad
01 Jan 2020
IEEE Access | VOL. 9

Resource Scheduling Strategy for Spark in Co-allocated Data Centers
Yi Liang ... Chaohui Zhang
-
Yi Liang, et. al.Yi Liang ... Chaohui Zhang
01 Jan 2021
01 Jan 2021

Detecting cache-related bugs in Spark applications
Hui Li ... Wensheng Dou
-
Hui Li, et. al.Hui Li ... Wensheng Dou
18 Jul 2020
18 Jul 2020

A Performance Prediction Model for Spark Applications
Muhammad Usama Javaid ... Florian Demesmaeker
-
Muhammad Usama Javaid, et. al.Muhammad Usama Javaid ... Florian Demesmaeker
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization.

Abstract

Talk to us

Similar Papers

More From: Sensors