Abstract

Docker container is experiencing rapid development with the support from the industry like Google and Alibaba and is being widely used in large scale production cloud environment. For example, Alibaba has deployed millions of containers for its internal business, and most of the online services are already migrated to the containers. Those services are usually very complex, spanning multiple containers with complex interaction and dependency relationship. Detecting potential anomalies in such a large container-based cloud platform is very challenging. Traditional detection models usually use system resource metrics like CPU and memory usage, but rarely consider the relationship among components, causing high false positive rate. In this paper, we present a novel Anomaly Detection and root cause localization method based on Graph Similarity (ADGS) in the container-based cloud environment. We first monitor the response time and resource usage of each component in the application to determine whether the system status is normal or not. Then, we propose a new mechanism to locate the root cause of the anomalies based on graph similarity, investigating the anomaly propagation rules among cluster components. We implement and evaluate our method in a container-based environment. The results show that the proposed method can detect and determine the root cause of anomalies efficiently and accurately.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call