Abstract

MapReduce is a popular programming model for big data processing. Although the distributed processing framework Hadoop greatly reduced the development complexity of MapReduce applications, fine tuning of the Hadoop systems for optimal performance remains a major challenge. Configuration tuning is one of the most effective means to improve the performance of MapReduce applications on Hadoop systems, which invariably adopt the default configuration. However, the huge Hadoop configuration parameter space makes it impractical to explore the parameter combinations exhaustively. In this paper, we propose H-Tune, an effective Hadoop configuration tuning approach for MapReduce applications. We design a non-intrusive performance profiler whose runtime overhead remains less than 2%, to capture the runtime details of the MapReduce applications and generate their performance evaluations. Based on the performance profiles, a two-level fusion model is constructed based on ensemble modeling for each application in the execution predictor, considering both Hadoop configuration, and input data size. Leveraging the execution predictor, a metaheuristic-based configuration optimizer is able to search for the optimal configuration for a given application. Experimental results demonstrate that the optimal Hadoop configuration is often application-specific and data-specific, and it is more suitable to take all relevant configuration parameters into consideration and optimize them together. H-Tune improves the performance of the MapReduce applications by factors of $1.5\times $ and $9.6\times $ on average, respectively, over the state-of-the-art approach and the default configuration.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.