Abstract

KPIs (Key Performance Indicators) in distributed systems may involve a variety of anomalies, which will lead to system failure and huge losses. Detecting KPI anomalies in the system is very important. This paper presents a time series anomaly detection method based on correlation analysis and HMM. Correlation analysis is used to obtain the correlation between abnormal KPIs in the system, thereby reducing the false alarm rate of anomaly detection. The HMM (Hidden Markov Model) is used for anomaly detection by finding the close relationship between abnormal KPIs. In our correlation analysis of abnormal KPIs, firstly, the time series prediction model (1D-CNN-TCN) is proposed. The residual sequence is obtained by calculating the residual between the predicted value and the actual value. The residual sequence can highlight the abnormal segment in each data point and improve the accuracy of anomaly screening. According to the obtained residual sequence, these abnormal KPIs are preliminarily screened out from the historical data. Next, KPI correlation analysis is performed, and the correlation score is obtained by adding a sliding window onto the obtained anomaly index residual sequence. The correlation analysis based on the residual sequence can eliminate the interference of the original data fluctuation itself. Then, a correlation matrix of abnormal KPIs is constructed using the obtained correlation scores. In anomaly detection, the constructed correlation matrix is processed to obtain the adaptive parameters of the HMM model, and the trained HMM is used to quickly discover the abnormal KPI that may cause a KPI anomaly. Experiments on public data sets show that the method obtains good results.

Highlights

  • KPI (Key Performance Indicator) anomaly detection is a low-level core technology in intelligent operation and maintenance

  • Convolutional Neural Network (CNN)-LSTM uses LSTM instead of the temporal convolutional network (TCN) used in the 1DCNN-TCN method

  • We can see that the TCN achieves better results for longer time series data prediction, while 1D CNN can capture local features, which is beneficial to the prediction effect of 1D-CNN-TCN

Read more

Summary

Introduction

KPI (Key Performance Indicator) anomaly detection is a low-level core technology in intelligent operation and maintenance. It is mainly aimed at current events. By analyzing the KPI curve, the abnormal behaviors of KPIs (sudden increase, sudden drop, and jitter) imply that some potential faults have occurred in related applications, such as increased access latency, network failure, or sharp decreases in access users [1]. Due to the huge complexity of the system, KPIs of the monitoring system are numerous and various. The efficiency of manually searching for abnormal KPIs is extremely low, and consumes a lot of manpower and material resources. The manual analysis of system failure will cause many misjudgments and produce certain economic losses

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.