How Does the Workload Look Like in Production Cloud? Analysis and Clustering of Workloads on Alibaba Cluster Trace

Wenyan Chen,Yang Wang,Kejiang Ye,Cheng-Zhong Xu,Guoyao Xu

doi:10.1109/padsw.2018.8644579

Abstract

Cloud computing technology is widely used in today's datacenters due to the benefits such as high scalability, on-demand services and low cost. An in-depth understanding of the characteristics of workloads running in production cloud environments is very important for improving the resource management efficiency. In this paper, we make a detailed analysis with visualization techniques and clustering methods on the trace dataset released by Alibaba which contains 11089 online services and 12951 batch jobs running on 1313 machines. Our methodology for clustering workloads contains: i) Select effective feature vectors as the dimensions of clustering; ii) Identify the cluster boundaries of each dimension using K-Means algorithm; iii) Classify jobs by combining the feature vectors which uses the results from previous step; iv) Analyze the characteristics of workload groups at runtime. Our analysis reveals several insights which previous work has not found on Alibaba cluster trace. For batch jobs: a) Average CPU cores of all batch jobs show bimodal-distribution obviously. b) At a random sampling time, more than 50 % machines only run one group of jobs with a short duration, medium CPU cores and small memory utilization, the remaining machines run mixed groups of jobs. For online instances: a) The resource usage (CPU, Memory, and Disk) of most online instances is low; b) There are up to six groups running on the same machine according to our clustering method at a random sampling time.

Full Text