Abstract
Accurately detecting anomalies and timely interventions are critical for cloud application maintenance. Traditional methods for performance anomaly detection based on thresholds and rules work well for simple key performance indicator (KPI) monitoring. Unfortunately, it is difficult to find the appropriate threshold levels when there are significant differences between KPI values at different times during the day or when there are significant fluctuations stemming from different usage patterns. Therefore, anomaly detection presents a challenge for all types of temporal data, particularly when non-stationary time series have special adaptability requirements or when the nature of potential anomalies is vaguely defined or unknown. To address this limitation, we propose a novel anomaly detector (called KPI-TSAD) for time-series KPIs based on supervised deep-learning models with convolution and long short-term memory (LSTM) neural networks, and a variational auto-encoder (VAE) oversampling model was used to address the imbalanced classification problem. Compared with other related research on Yahoo’s anomaly detection benchmark datasets, KPI-TSAD exhibited better performance, with both its accuracy and F-score exceeding 0.90 on the A1benchmark and A2Benchmark datasets. Finally, KPI-TSAD continued to perform well on several KPI monitoring datasets from real production environments, with the average F-score exceeding 0.72.
Highlights
Internet and software technologies have developed considerably over the last decade
As the anomaly detector presented in this paper seeks to perform anomaly detection in the monitoring of key performance indicator (KPI) in a cloud environment, on the one hand, we need to verify the performance of KPI-TSAD through the benchmark datasets
We compared different oversampling methods that were integrated in KPI-TSAD to illustrate the superiority of VAEGEN oversampling method
Summary
Internet and software technologies have developed considerably over the last decade. With the increasing complexity of user requirements, software system architecture has evolved from a monolithic system to today’s cloud-native architecture. Container-based cloud application architecture has become the most popular software architecture. Cloud-native architecture has numerous benefits, such as high availability and high scalability, the management and maintenance of cloud applications present a new challenge, and the dependability of cloud applications has become a major concern for application providers. Anomalies, such as resource competition for concurrent requests and deadlock, must be accompanied by other symptoms before they occur. Real-time anomaly detection and alarms for cloud applications are necessary. This paper presents KPI-TSAD, a novel anomaly detection approach for time-series data that contain both spatial features and temporal features. KPI-TSAD to attain good generalization capabilities in data-scarce scenarios where fewer training data are available
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have