Abstract

Configuration parameter optimization is an important means of improving the performance of the MapReduce model. The existing parameter tuning methods usually optimize all configuration parameters in MapReduce. However, it is exceedingly challenging to tune all the parameters for the MapReduce model because there are massive configuration parameters in MapReduce. In this paper, a novel configuration parameter tuning method based on a feature selection algorithm is proposed, and it is composed of the feature selection objective function and feature selection process. The objective function is based on the kernel clustering algorithm, in which anisotropic Gaussian kernel is adopted instead of the traditional Gaussian kernel to accurately judge the importance of each parameter in MapReduce. Then, the relationship between the configuration parameters in MapReduce and the features in the feature selection algorithm is defined. Moreover, the importance of each parameter is reflected by the kernel width of anisotropic Gaussian kernels. At the same time, the method of gradient descent is introduced to update the kernel width and control the feature selection process of the iterative algorithm. Finally, experimental results show that the proposed algorithm performs suitably for the MapReduce model.

Highlights

  • In recent decades, the scale of data in various fields has grown rapidly

  • 2) This paper presents a clustering feature selection algorithm (IK-means for short) based on the kernel function penalty, which solves the problem that platform management personnel encounter due to the difficult of configuring the excessive configuration parameters in MapReduce

  • Anisotropic Gaussian kernel is introduced in the IK-means algorithm, in which each kernel width corresponds to a configuration parameter in Hadoop MapReduce

Read more

Summary

INTRODUCTION

The scale of data in various fields has grown rapidly. High-performance computing and distributed data processing technology are widely used in the data analysis of various fields. These factors lead to inefficiencies for jobs in Hadoop In these factors, massive configuration parameters are the primary problem, because the task scheduler, data locality, copy placement and other optimizations need to be based on reasonable configuration parameters [1], [6]. To effectively improve the optimization of MapReduce configuration parameters, the paper presents a configuration parameter optimization method based on feature selection. 2) This paper presents a clustering feature selection algorithm (IK-means for short) based on the kernel function penalty, which solves the problem that platform management personnel encounter due to the difficult of configuring the excessive configuration parameters in MapReduce.

RELATED WORK
CLUSTERING FEATURE SELECTION BASED ON KERNEL FUNCTION PENALTY
FEATURE PENALTY FUNCTION
THE ESTABLISHMENT OF THE OBJECTIVE FUNCTION
EXPERIMENT AND ANALYSIS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call