Abstract

Accurately detecting anomalies and timely interventions are critical for cloud application maintenance. Traditional methods for performance anomaly detection based on thresholds and rules work well for simple key performance indicator (KPI) monitoring. Unfortunately, it is difficult to find the appropriate threshold levels when there are significant differences between KPI values at different times during the day or when there are significant fluctuations stemming from different usage patterns. Therefore, anomaly detection presents a challenge for all types of temporal data, particularly when non-stationary time series have special adaptability requirements or when the nature of potential anomalies is vaguely defined or unknown. To address this limitation, we propose a novel anomaly detector (called KPI-TSAD) for time-series KPIs based on supervised deep-learning models with convolution and long short-term memory (LSTM) neural networks, and a variational auto-encoder (VAE) oversampling model was used to address the imbalanced classification problem. Compared with other related research on Yahoo’s anomaly detection benchmark datasets, KPI-TSAD exhibited better performance, with both its accuracy and F-score exceeding 0.90 on the A1benchmark and A2Benchmark datasets. Finally, KPI-TSAD continued to perform well on several KPI monitoring datasets from real production environments, with the average F-score exceeding 0.72.

Highlights

  • Internet and software technologies have developed considerably over the last decade

  • As the anomaly detector presented in this paper seeks to perform anomaly detection in the monitoring of key performance indicator (KPI) in a cloud environment, on the one hand, we need to verify the performance of KPI-TSAD through the benchmark datasets

  • We compared different oversampling methods that were integrated in KPI-TSAD to illustrate the superiority of VAEGEN oversampling method

Read more

Summary

Introduction

Internet and software technologies have developed considerably over the last decade. With the increasing complexity of user requirements, software system architecture has evolved from a monolithic system to today’s cloud-native architecture. Container-based cloud application architecture has become the most popular software architecture. Cloud-native architecture has numerous benefits, such as high availability and high scalability, the management and maintenance of cloud applications present a new challenge, and the dependability of cloud applications has become a major concern for application providers. Anomalies, such as resource competition for concurrent requests and deadlock, must be accompanied by other symptoms before they occur. Real-time anomaly detection and alarms for cloud applications are necessary. This paper presents KPI-TSAD, a novel anomaly detection approach for time-series data that contain both spatial features and temporal features. KPI-TSAD to attain good generalization capabilities in data-scarce scenarios where fewer training data are available

Background
Problem Description and Definition
Contribution and Outlines
Related Works
Statistical Prediction Methods
Time-Series Decomposition Methods
State-Transition Models
Deep-Learning Methods
Supervised Approaches
Unsupervised Approaches
State-of-The-Art Methods for Comparison
Oversampling for Imbalanced Time Series
Proposed Anomaly Detector
Preprocess the Input Data
VAE-Based Oversampling Approach
Neural Network Architecture
Loss Function
Performance Evaluation Metrics
Benchmark Datasets
AIOps KPI Monitoring Datasets
Hyper-Parameters of the Proposed Anomaly Detector in the experiments
Results and Discussion
Comparisons with State-of-The-Art Methods
Comparisons with Traditional Oversampling Methods
Comparisons with Other Deep Neural Networks
Performance on Real Production Datasets
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call