Abstract

Search engines are nowadays widely applied to store and analyze logs generated by large-scale distributed systems. To adapt to various workload scenarios, log search engines such as Elasticsearch usually expose a large number of performance-related configuration parameters. As manual configuring is time consuming and labor intensive, automatically tuning configuration parameters to optimize performance has been an urgent need. However, it is challenging because: 1) Due to the complex implementation, the relationship between performance and configuration parameters is difficult to model and thus the objective function is actually a black box; 2) In addition to application parameters, JVM and kernel parameters are also closely related to the performance and together they construct a high dimensional configuration space; 3) To iteratively search for the best configuration, a tool is necessary to automatically deploy the newly generated configuration and launch tests to measure the corresponding performance. To address these challenges, this paper designs and implements HDConfigor, an automatic holistic configuration parameter tuning tool for log search engines. In order to solve the high dimensional optimization problem, we propose a modified Random EMbedding Bayesian Optimization algorithm (mREMBO) in HDConfigor which is a black-box approach. Instead of directly using a black-box optimization algorithm such as Bayesian optimization (BO), mREMBO first generates a lower dimensional embedded space through introducing a random embedding matrix and then performs BO in this embedded space. Therefore, HDConfigor is able to find a competitive configuration automatically and quickly. We evaluate HDConfigor in an Elasticsearch cluster with different workload scenarios. Experimental results show that compared with the default configuration, the best relative median indexing results achieved by mREMBO can reach $2.07\times $ . In addition, under the same number of trials, mREMBO is able to find a configuration with at least a further 10.31% improvement in throughput compared to Random search, Simulated Annealing and BO.

Highlights

  • In the era of big data and artificial intelligence, large scale distributed systems are generating tons of logs in the meantime with data processing

  • EXPERIMENTAL RESULTS we first evaluate the effectiveness of modified Random EMbedding Bayesian Optimization algorithm (mREMBO) through comparing it with the three other blackbox optimization algorithms

  • We show the performance of mREMBO when applied to different workload scenarios

Read more

Summary

INTRODUCTION

In the era of big data and artificial intelligence, large scale distributed systems are generating tons of logs in the meantime with data processing. To adapt to varying workload scenarios, log search engines usually expose a considerable large number of configuration parameters to developers that nearly can change all the runtime behaviors Different settings of these configuration parameters can significantly affect the end-to-end performance of document indexing and. Based on the in-depth analysis of previous studies as well as our experimental observations on a practical log search engine cluster, there are mainly three challenges to solve this problem:. As shown, HDconfigor is able to simultaneously address all the three challenges described above to automatically tune high dimensional configuration parameters for log search engines. We conclude that the full stack configuration space of Elasticsearch have low effective dimensions Based on this observation, we propose mREMBO algorithm to solve the high dimensional black-box optimization problem.

RELATED WORK
EXPERIMENTAL RESULTS
EFFECTIVENESS OF MREMBO Q1
NECESSITY OF FULL STACK CONFIGURATION PARAMETERS OPTIMIZATION Q3
IMPACT OF DIFFERENT WORKLOAD SCENARIOS Q4
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.