Development of Anomaly Detection System Based on Distributed Log Tracing

D A Khudyakov

doi:10.25205/1818-7900-2023-21-1-62-72

Abstract

Software system developers must respond quickly to failures in order to avoid reputational and financial losses for their customers. Therefore, it is important to detect behavioral anomalies in the operation of software systems in a timely manner. At the moment, various tools for automatic monitoring of systems are being actively developed, but logs are the main tool for analyzing failures. Logs contain information about the operation of the system at various points of execution. Modern systems often have a distributed microservice architecture, which significantly complicates the task of analyzing logs. Logs of such systems are collected centrally from different microservices, forming a huge flow of information that is very difficult to analyze manually. However, the problem of identifying logs related to a specific request to the system is solved by distributed tracing, the use of which opens up wide opportunities for the introduction of automatic analysis. There are already many solutions for detecting anomalies in logs, but they do not take advantage of distributed tracing. The article is considered to solving the problem of detecting behavioral anomalies in the work of distributed software systems based on automatic analysis of log traces. The solution is based on the synthesis of machine learning methods. Log traces are preprocessed and cleaned using process mining methods. Next, vectorization and clustering of log messages is performed. After that, a long short-term memory network (LSTM) is used to analyze deviations in the sequences of processed logs. As a result of the work performed, a prototype of the anomaly detection system was developed and tested.

Full Text