ServiceAnomaly: An anomaly detection approach in microservices using distributed traces and profiling metrics

Mahsa Panahandeh,Abdelwahab Hamou-Lhadj,Mohammad Hamdaqa,James Miller

doi:10.1016/j.jss.2023.111917

Abstract

Anomaly detection is an essential activity for identifying abnormal behaviours in microservice-based systems. A common approach is to model the system behaviour during normal operation using either distributed traces or profiling metrics. The model is then used to detect anomalies during system operation. In this paper, we present a new anomaly detection approach, called ServiceAnomaly, for anomaly detection in microservice systems that combines distributed traces and six profiling metrics to build an annotated directed acyclic graph that characterizes the normal behaviour of the system. Unlike existing techniques, our approach captures the context propagation provided by distributed traces as a graph that is annotated with functions characterizing both linear and non-linear relationships between profiling metrics. The final annotated graph is used to detect abnormal executions during system operation. The results of applying our approach to two open-source benchmarks show that our approach detects anomalies with an F1-score up to 86%. We also show how developers can use the annotated graph to reason about the causes of anomalies

Full Text