Abstract

With the development of Artificial Intelligence for IT Operations (AIOps), numerous software and services are monitored by Key Performance Indicators (KPIs) collection components. Multivariate KPIs, as a type of time series data, are essential for effective management of the entity's service quality. In recent years, deep learning methods have made great improvements in the anomaly detection of multivariate time series; however, existing methods have not fully considered how to explicitly capture the correlation between multivariate time series in the feature dimension and temporal dimension, resulting in inevitable abnormal false positives. Therefore, this paper proposes a self-supervised multivariate KPIs anomaly detection method MAD-STA that combines graph structure learning and spatio-temporal GAT (Graph Attention Network). In the feature dimension, MAD-STA introduces a node embedding mechanism for graph structure learning and then uses the feature-oriented GAT layer to compute the graph attention coefficient to obtain the correlation between different KPIs. In the temporal dimension, MAD-STA uses the time-oriented GAT layer to compute attention weights between correlated timestamps, and the GRU-based VAE encoder captures long-term dependence to extract more comprehensive temporal feature representations. Finally, MAD-STA uses GRU-based VAE decoder to reconstruct the captured high-level features and achieves efficient anomaly detection and localization by calculating the anomaly score of multiple KPIs. Compared with the baseline methods on multiple data sets, the experimental results show that the anomaly detection accuracy of MAD-STA is better than that of the baseline method. Especially on the KPI data sets of the two server clusters of SMD and CKM, MAD-STA improves the performance and the F1 comprehensive index compared with the best baseline method. In addition, MAD-STA performs well on anomaly false positive rate and has excellent interpretability, which can be used to assist anomaly diagnosis and root cause index analysis.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.