Clustering Cloud Workloads: K-Means vs Gaussian Mixture Model

Eva Patel,Dharmender Singh Kushwaha

doi:10.1016/j.procs.2020.04.017

Eva Patel, Dharmender Singh Kushwaha

Open Access

https://doi.org/10.1016/j.procs.2020.04.017

Copy DOI

Abstract

Abstract The growing heterogeneity due to diverse Cloud workloads such as Big Data, IoT and Business Data analytics, requires precise characterization to design a successful capacity plan and maintain the competitiveness of Cloud service providers. K-Means is a simple and fast clustering method, but it may not truly capture heterogeneity inherent in Cloud workloads. Gaussian Mixture Models can discover complex patterns and group them into cohesive, homogeneous components that are close representatives of real patterns within the data set. This work compares K-Means and Gaussian Mixture Model to evaluate cluster representativeness of the two methods for heterogeneity in resource usage of Cloud workloads. Experiments conducted with Google cluster trace and business critical workloads by Bitbrains reveal that clusters obtained using K-Means give a very abstracted information. Gaussian Mixture Model provides better clustering with distinct usage boundaries. Although, Gaussian Mixture Model has higher computation time than K-Means, it can be used when more fine-grained workload characterization and analysis is required.

Full Text