Abstract

Today&#x0027;s large datacenters house a massive number of machines, each of which is being closely monitored with multivariate time series (e.g., CPU idle, memory utilization) to ensure service quality. Detecting outlier machine instances with multivariate time series is crucial for service management. However, it is a challenging task due to the multiple classes and various shapes, high dimensionality, and lack of labels of multivariate time series. In this article, we propose DOMI, a novel unsupervised model that combines Gaussian mixture VAE with 1D-CNN, to <b>d</b>etect <b>o</b>utlier <b>m</b>achine <b>i</b>nstances. Its core idea is to capture the normal patterns of machine instances by learning their latent representations that consider the shape characteristics, reconstruct input data by the learned representations, and apply reconstruction probabilities to determine outliers. Moreover, DOMI interprets the detected outlier instance based on the reconstruction probability changes of univariate time series. Extensive experiments have been conducted on the dataset collected from 1821 machines with a 1.5-month-period, which are deployed in ByteDance, a top global content service provider. DOMI achieves the best F1-Score of 0.94 and AUC score of 0.99, significantly outperforming the best performing baseline method by 0.08 and 0.03, respectively. Moreover, its interpretation accuracy is up to 0.93.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call