A Novel Configuration Tuning Method Based on Feature Selection for Hadoop MapReduce

Jun Liu,Mingwei Lin,Sule Tang,Guangxia Xu,Chuang Ma

doi:10.1109/access.2020.2984778

Abstract

Configuration parameter optimization is an important means of improving the performance of the MapReduce model. The existing parameter tuning methods usually optimize all configuration parameters in MapReduce. However, it is exceedingly challenging to tune all the parameters for the MapReduce model because there are massive configuration parameters in MapReduce. In this paper, a novel configuration parameter tuning method based on a feature selection algorithm is proposed, and it is composed of the feature selection objective function and feature selection process. The objective function is based on the kernel clustering algorithm, in which anisotropic Gaussian kernel is adopted instead of the traditional Gaussian kernel to accurately judge the importance of each parameter in MapReduce. Then, the relationship between the configuration parameters in MapReduce and the features in the feature selection algorithm is defined. Moreover, the importance of each parameter is reflected by the kernel width of anisotropic Gaussian kernels. At the same time, the method of gradient descent is introduced to update the kernel width and control the feature selection process of the iterative algorithm. Finally, experimental results show that the proposed algorithm performs suitably for the MapReduce model.

Highlights

In recent decades, the scale of data in various fields has grown rapidly
2) This paper presents a clustering feature selection algorithm (IK-means for short) based on the kernel function penalty, which solves the problem that platform management personnel encounter due to the difficult of configuring the excessive configuration parameters in MapReduce
Anisotropic Gaussian kernel is introduced in the IK-means algorithm, in which each kernel width corresponds to a configuration parameter in Hadoop MapReduce

Summary

INTRODUCTION

The scale of data in various fields has grown rapidly. High-performance computing and distributed data processing technology are widely used in the data analysis of various fields. These factors lead to inefficiencies for jobs in Hadoop In these factors, massive configuration parameters are the primary problem, because the task scheduler, data locality, copy placement and other optimizations need to be based on reasonable configuration parameters [1], [6]. To effectively improve the optimization of MapReduce configuration parameters, the paper presents a configuration parameter optimization method based on feature selection. 2) This paper presents a clustering feature selection algorithm (IK-means for short) based on the kernel function penalty, which solves the problem that platform management personnel encounter due to the difficult of configuring the excessive configuration parameters in MapReduce.

RELATED WORK

CLUSTERING FEATURE SELECTION BASED ON KERNEL FUNCTION PENALTY

FEATURE PENALTY FUNCTION

THE ESTABLISHMENT OF THE OBJECTIVE FUNCTION

EXPERIMENT AND ANALYSIS

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 43	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Novel Configuration Tuning Method Based on Feature Selection for Hadoop MapReduce

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Impact of Feature Extraction and Feature Selection Algorithms on Punjabi Speech Emotion Recognition Using Convolutional Neural Network
Kamaldeep Kaur ... Parminder Singh
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 21
Kamaldeep Kaur, et. al.Kamaldeep Kaur ... Parminder Singh
29 Apr 2022
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 21

New hybrid data mining model for credit scoring based on feature selection algorithm and ensemble classifiers
Jasmina Nalić ... Drago Žagar
Artificial Intelligence in Engineering | VOL. 45
Jasmina Nalić, et. al.Jasmina Nalić ... Drago Žagar
12 Jun 2020
Artificial Intelligence in Engineering | VOL. 45

A comparative study of feature selection methods for binary text streams classification
Matheus Bernardelli De Moraes ... Andre Leon Sampaio Gradvohl
Evolving Systems | VOL. 12
Matheus Bernardelli De Moraes, et. al.Matheus Bernardelli De Moraes ... Andre Leon Sampaio Gradvohl
17 Oct 2020
Evolving Systems | VOL. 12

A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring
Fatemeh Nemati Koutanaei ... Mohammad Khanbabaei
Journal of Retailing and Consumer Services | VOL. 27
Fatemeh Nemati Koutanaei, et. al.Fatemeh Nemati Koutanaei ... Mohammad Khanbabaei
16 Jul 2015
Journal of Retailing and Consumer Services | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Novel Configuration Tuning Method Based on Feature Selection for Hadoop MapReduce

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions