Log anomaly detection is a method for finding abnormal behavior and faults in systems. However, existing methods face two main challenges: the open-world problem and the cold-start problem. The open-world problem means that the test set may contain new classes that are not in the training set, while the cold-start problem means that the initial training data are scarce, both for normal and abnormal log sequences. Most existing methods assume a closed-world setting and rely on sufficient normal data, which limits their adaptability to new log environments.We propose LogOW, a novel log anomaly detection model that can learn from a few normal log sequences. The model finds emerging normal log sequences in the open-world setting through the open-world sample retrieval module. Through the incremental pre-training module, these log sequences are fine-tuned in an online mode for model parameters.First, we train a basic model from normal log sequences using Masked-Language Modeling(MLM). During the testing phase, we then combine the anomaly score and the uncertainty score obtained through a novel dynamic multi-mask to distinguish closed-world normal log sequences from the test set. Next, we cluster the open-world log sequences based on fused sequence and count features, and identify the abnormal ones and the new normal ones. Finally, we update our model with the new normal sequences in the next time period. Experiments on three log datasets and real-world airport logs show that our model outperforms traditional models in the open-world and lack of training data setting.
Read full abstract