Abstract

SummaryData centers today host a number of computational resources to support the increasing demand for computation and storage. Understanding how these physical and virtual machines transition between different states of operation (referred to as machine lifecycle) enables more efficient data center operation management. Furthermore, it helps data center operators define policies on how new computational resources can be added or existing infrastructure decommissioned. Using Google cluster trace data set version 3 collected from approximately 96 k machines, we analyze machine failure and changes in machine lifecycle over time. We observed that there is a 13% chance of another machine failure under the same network switch within 1 min of the previous machine failure. A Markov chain‐based model is proposed, that can predict machine states at any given time. Using the model and estimated probabilities, we predicted the machine state over a span of several days with a high probability. Using the predicted machine state, we reconstructed the active machines trend and compared this with the trend reported in the data set, observing an error of 1.76%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.