Machine Learning-Based Configuration Parameter Tuning on Hadoop System

Chi-Ou Chen,Ye-Qi Zhuo,Che-Min Lin,Chao-Chun Yeh,Shih-Wei Liao

doi:10.1109/bigdatacongress.2015.64

Abstract

Apache Hadoop system is a software framework with the capability to process large-scale datasets across a cluster of distributed machines using MapReduce programming model. However, there are two main challenges for system administrators to manage the Hadoop system, (1) system administrators are difficult to tune the parameters appropriately since the behaviors and characteristics of large-scale distributed systems are too complicated, (2) there are dozens of configuration parameters affecting the system performance which makes the configuration parameters tuning task becomes troublesome. In this paper, we focus on optimizing the Hadoop MapReduce job performance by tuning configuration parameters, and then we propose an analytical method to help system administrators choose approximately optimal configuration parameters depending on the characteristics of each application. Our approach has two key phases: prediction and optimization phase. The prediction phase is to estimate the performance of a MapReduce job, whereas the optimization phase is to search the approximately optimal configuration parameters strategically by invoking the predictor repeatedly. In our evaluation results, our work can help system administrators to improve the performance about 2X to 8X better than traditional methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Machine Learning-Based Configuration Parameter Tuning on Hadoop System

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Apache Hadoop Yarn Parameter configuration Challenges and Optimization
Bhavin J Mathiya ... Vinodkumar L Desai
-
Bhavin J Mathiya, et. al.Bhavin J Mathiya ... Vinodkumar L Desai
01 Feb 2015
01 Feb 2015

PStorM: Profile storage and matching for feedback-based tuning of MapReduce jobs
...
-
, et. al. ...
01 Jan 2014
01 Jan 2014

MRTune: A simulator for performance tuning of MapReduce jobs with skewed data
Xibo Zhou ... Wuman Luo
-
Xibo Zhou, et. al.Xibo Zhou ... Wuman Luo
01 Dec 2014
01 Dec 2014

Toward Building a Digital Twin of Job Scheduling and Power Management on an HPC System
Tatsuyoshi Ohmura ... Yoichi Shimomura
-
Tatsuyoshi Ohmura, et. al.Tatsuyoshi Ohmura ... Yoichi Shimomura
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine Learning-Based Configuration Parameter Tuning on Hadoop System

Abstract

Talk to us

Similar Papers