Loader: A Log Anomaly Detector Based on Transformer

Tong Xiao,Kenli Li,Yunfei Du,Keqin Li,Zhe Quan,Zhi-Jie Wang,Yuquan Le,Xiangke Liao

doi:10.1109/tsc.2023.3280575

Abstract

Detecting anomalies in logs is crucial for service and system management, since logs are widely used to record the runtime status, and are often the only data available for postmortem analysis. Since anomalies are usually rare in real-world services and systems, a common and feasible practice is to mine or learn normal patterns from logs, and deem those violating the normal patterns as anomalies. As log sequences are a kind of time series data, RNN (Recurrent Neural Network) and its variants have been extensively employed to capture the normal patterns. Nevertheless, the sequential nature of RNN and its variants makes them hard to parallelize and capture long-term dependencies, which may hinder their performance. To address this issue, in this paper we propose Loader, a novel semi-supervised <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">lo g <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a nomaly <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d etector based on Transform <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">er , because the Transformer architecture eschews recurrence and is able to draw global dependencies. Loader leverages the Transformer encoder to capture normal patterns from normal log sequences. When detecting, it gives a set of candidate log templates, that may appear after the input log substring under normal conditions. If the template of the actual next log message is not within the candidate set, this implies an anomaly. Previous similar methods select the most possible <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> log templates as candidates in any case, so the performance is sensitive to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> , and it is nontrivial to pick a proper <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$k$</tex-math></inline-formula> . To alleviate this, we design a more flexible and robust ‘top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$p$</tex-math></inline-formula> ’ algorithm, which determines the candidate set based on the cumulative probability of the most possible log templates. Extensive experiments are conducted based on three public log datasets, the experimental results validate the effectiveness and competitiveness of our approach.

Full Text