Abstract

Data set used in machine learning must be meaningful and reasonable for them to be useful. Preparing data set to be used in machine learning is an important task to map between raw data input (cause) and output (result) relation. Google traces are dump tables that contain raw data about data center servers, workload jobs and tasks status, workload demands, data center provisioned resources and hardware reconfiguration. Machine learning considers input data set and output data set as a training learning model by relating these two correlated sets as normalized values for future use. In this work, a data cleaning and full analysis for Google data center traces have been done to be used in deep machine learning models, like conventional neural network or recurrent neural network with capability of using inline learning model. Data set are processed by re-normalizing resources capacities and workloads demands (jobs and tasks), and by removing or transforming non-available and opaque values into useful values. A novel correlation between input and output data sets for Google traces is introduced that relates workload demands with data center resources reconfiguration, and data center resources with data center capacity. The idea is to connect between demands and provisioned resources by evaluating the cloud data center configuration sets, which allows cloud manager to provision resources by setting up the best reconfiguration in scaling cloud data center. An evaluation factor (scale factor) has been introduced to evaluate elastic provisioning resources considering resources preparation time and minimum cost.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call