Multivariate time series anomaly detection with missing data is one of the most pending issues for industrial monitoring. Due to scarcity of labeled anomalies, most advanced data-driven anomaly detection approaches fall in the unsupervised learning paradigm. As a premise in the presence of missing data, one needs to improve the data quality through data imputation with a separate model. Our concern lies in the consistency between data imputation and unsupervised learning for robust anomaly detection, regarding accurately discovering the spatiotemporal dependence among multiple variables over time. However, the existing practice tends to overlook this consistency and decouple the training process for these two closely linked tasks. This article novelly proposes a probabilistic multivariate time series anomaly detection framework that unifies data imputation and unsupervised learning. A deep probabilistic graphical model abbreviated SCNF is first devised for unsupervised density estimation. A tailored expectation maximization-based optimization scheme is then developed to achieve the joint training of data imputation and unsupervised learning with missing data. The efficacy is experimentally corroborated in several industrial applications, including chemical process, water treatment and network traffic. Briefly, the joint training framework enhances the AUROC of SCNF by averagely 6.34% for three applications under 50% data missing rate.
Read full abstract