Abstract

Accurate estimation of data center resource utilization is a challenging task due to multi-tenant co-hosted applications having dynamic and time-varying workloads. Accurate estimation of future resources utilization helps in better job scheduling, workload placement, capacity planning, proactive auto-scaling, and load balancing. The inaccurate estimation leads to either under or over-provisioning of data center resources. Most existing estimation methods are based on a single model that often does not appropriately estimate different workload scenarios. To address these problems, we propose a novel method to adaptively and automatically identify the most appropriate model to accurately estimate data center resources utilization. The proposed approach trains a classifier based on statistical features of historical resources usage to decide the appropriate prediction model to use for given resource utilization observations collected during a specific time interval. We evaluated our approach on real datasets and compared the results with multiple baseline methods. The experimental evaluation shows that the proposed approach outperforms the state-of-the-art approaches and delivers 6% to 27% improved resource utilization estimation accuracy compared to baseline methods.

Highlights

  • T ECHNOLOGICAL advances in server virtualization and cloud computing allow cost-effective hosting of multiple applications in a secure, customizable, and isolated computing environment managed by modern data centers

  • In this paper we focused on classical machine learning approach and did not use deep learning as the learning process is considered as black-box [14] and to understand the reasoning of the model’s prediction behavior is not apparent

  • We performed a set of experiments to evaluate different methods to make such estimation by comparing different classifiers namely Random Decision Forest (RDF), Gradient Boosting Tree (GBT), Multi-layer Perceptron (MLP), K-Nearest Neighbors(k-Neural Networks (NN)), Gaussian Naive Bayes (NB), and Support Vector Machine (SVM) with linear kernel

Read more

Summary

INTRODUCTION

T ECHNOLOGICAL advances in server virtualization and cloud computing allow cost-effective hosting of multiple applications in a secure, customizable, and isolated computing environment managed by modern data centers. While there are several estimation methods for cloud resource utilization using time-series learning or deep-learning networks [9], [10], [11], all use a single model that often does not accurately capture the workload dynamics. Our approach focuses on training estimation models using different methods selecting the one that will yield the best prediction given the current scenario and the previous batch of collected data. Deep learning performs quite well once trained for a particular problem; model miserably fails when applying on a similar type of other problems and required to retrain Due to these reasons, we selected the traditional machine learning approach and propose a novel adaptive model selector method, to dynamically identify the best prediction method for estimating resource utilization of data centers, from a bag of trained methods with different characteristics and accuracy over different data center behaviors.

RELATED WORK
PROPOSED SYSTEM OVERVIEW
MACHINE LEARNING METHODS
Workload Prediction Methods
FEATURE EXTRACTION AND SELECTION
Datasets
Methodology
Experimental Details
AMS Evaluation
Resource Utilization Estimation
Window Size Sensitivity Analysis
Evaluation Using BitBrains Data set
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call