Detection of Cluster Anomalies With ML Techniques

Joanna Kosinska,Maciej Tobiasz

doi:10.1109/access.2022.3216080

Abstract

Due to the increasing complexity of computing clusters, it becomes more challenging to identify erroneous behavior inside them. Monitoring systems collect large amounts of data to provide analysis of cluster behavior. Their manual study is expensive and sometimes even impossible. Hence, a system capable of analyzing the gathered data and then, based on those data, detecting anomalies within the cluster is essential. Our paper proposes a system for detecting an anomaly in a Kubernetes cluster (KAD - Kubernetes Anomaly Detector). KAD uses machine learning techniques to diagnose problems that may arise. The novelty of our solution lies in proposing the concept of using various machine learning models that facilitate the detection of different types of anomalies. The user can choose which model to use for anomaly detection in the given situation. The KAD system can also automatically select the appropriate model. The selection is based on scoring procedures. The system trains the models using historical data that describe the usual behavior of the cluster. Then it detects anomalies based on the predictions of the trained model or signal reconstructions. Finally, in two groups of experiments, we evaluate the proposed concepts. The experiments confirm the importance of selecting the proper ML model to detect anomalies in different situations. Furthermore, the experiments assess the latency of the responses in a production-ready cluster.

Full Text