SemantiLog: Log-based Anomaly Detection with Semantic Similarity
Logs produced by software applications are invaluable for spotting deviations from expected system behavior. However, automatically detecting anomalies from log data is challenging due to the volume, semi-structured nature, lack of standard formatting, and potential evolution of log records over time. In this work, we approach log-based anomaly detection as a semantic similarity problem. We generate pairwise similarity scores using a general-purpose pre-trained language model and further augment them with ground-truth binary labels. The generated similarity labels supervise an encoder trained for semantic similarity. At inference time, anomalies are detected based on the cosine similarity between the encoded query sequence and the average normal encoding. Our method outperforms contemporary techniques on multiple benchmarks without template extraction or a fixed vocabulary and achieves competitive performance even when provided with limited abnormal examples.
- Research Article
17
- 10.1145/3644386
- Jun 3, 2024
- ACM Transactions on Software Engineering and Methodology
With the rapid development of deep learning (DL), the recent trend of log-based anomaly detection focuses on extracting semantic information from log events (i.e., templates of log messages) and designing more advanced DL models for anomaly detection. Indeed, the effectiveness of log-based anomaly detection can be improved, but these DL-based techniques further suffer from the limitations of more heavy dependency on training data (such as data quality or data labels) and higher costs in time and resources due to the complexity and scale of DL models, which hinder their practical use. On the contrary, the techniques based on traditional machine learning or data mining algorithms are less dependent on training data and more efficient but produce worse effectiveness than DL-based techniques, which is mainly caused by the problem of unseen log events (some log events in incoming log messages are unseen in training data) confirmed by our motivating study. Intuitively, if we can improve the effectiveness of traditional techniques to be comparable with advanced DL-based techniques, then log-based anomaly detection can be more practical. Indeed, an existing study in the other area (i.e., linking questions posted on Stack Overflow) has pointed out that traditional techniques with some optimizations can indeed achieve comparable effectiveness with the state-of-the-art DL-based technique, indicating the feasibility of enhancing traditional log-based anomaly detection techniques to some degree. Inspired by the idea of “try-with-simpler,” we conducted the first empirical study to explore the potential of improving traditional techniques for more practical log-based anomaly detection. In this work, we optimized the traditional unsupervised PCA (Principal Component Analysis) technique by incorporating a lightweight semantic-based log representation in it, called SemPCA , and conducted an extensive study to investigate the potential of SemPCA for more practical log-based anomaly detection. By comparing seven log-based anomaly detection techniques (including four DL-based techniques, two traditional techniques, and SemPCA ) on both public and industrial datasets, our results show that SemPCA achieves comparable effectiveness as advanced supervised/semi-supervised DL-based techniques while being much more stable under insufficient training data and more efficient, demonstrating that the traditional technique can still excel after small but useful adaptation.
- Research Article
1
- 10.1145/3729346
- Jun 19, 2025
- Proceedings of the ACM on Software Engineering
With the rapid advancement of cloud-native computing, securing cloud environments has become an important task. Log-based Anomaly Detection (LAD) is the most representative technique used in different systems for attack detection and safety guarantee, where multiple LAD methods and relevant datasets have been proposed. However, even though some of these datasets are specifically prepared for cloud systems, they only cover limited cloud behaviors and lack information from a whole-system perspective. Another critical issue to consider is normality shift, which implies that the test distribution could differ from the training distribution and highly affect the performance of LAD. Unfortunately, existing works only focus on simple shift types such as chronological changes, while other cloud-specific shift types are ignored, e.g., different deployed cloud architectures. Therefore, a dataset that captures diverse cloud system behaviors and various types of normality shifts is essential. To fill this gap, we construct a dataset CAShift to evaluate the performance of LAD in cloud, which considers different roles of software in cloud systems, supports three real-world normality shift types (application shift, version shift, and cloud architecture shift), and features 20 different attack scenarios in various cloud system components. Based on CAShift, we conduct a comprehensive empirical study to investigate the effectiveness of existing LAD methods in normality shift scenarios. Additionally, to explore the feasibility of shift adaptation, we further investigate three continuous learning approaches, which are the most common methods to mitigate the impact of distribution shift. Results demonstrated that 1) all LAD methods suffer from normality shift where the performance drops up to 34%, and 2) existing continuous learning methods are promising to address shift drawbacks, but the ratio of data used for model retraining and the selection of algorithms highly affect the shift adaptation, with an increase in the F1-Score of up to 27%. Based on our findings, we offer valuable implications for future research in designing more robust LAD models and methods for LAD shift adaptation.
- Conference Article
- 10.23919/cnsm55787.2022.9964935
- Oct 31, 2022
Anomaly detection is the key to Quality of Service (QoS) in many modern systems. Logs, which record the runtime information of system, are widely used for anomaly detection. The security of the log-based anomaly detection has not been well investigated. In this paper, we conduct an empirical study on black-box attacks on log-based anomaly detection. We investigate eight different methods on log attacking and compare their performance on various log parsing methods and log anomaly detection models. We propose a method to evaluate the imperceptibility of log attacking methods. In our experiments, we evaluate the performance on the attack methods on two real log datasets. The results of our experiments show that LogBug outperforms the others in almost all situations. We also compare the imperceptibility of various attack methods and find a trade-off between performance and imperceptibility, where better attack performance means worse imperceptibility. To the best of our knowledge, this is the first work to investigate and compare the attack models on log-based anomaly detection.
- Conference Article
16
- 10.1109/ftxs56515.2022.00006
- Nov 1, 2022
With the increasing prevalence of scalable file systems in the context of High Performance Computing (HPC), the importance of accurate anomaly detection on runtime logs is increasing. But as it currently stands, many state-of-the-art methods for log-based anomaly detection, such as DeepLog, have encountered numerous challenges when applied to logs from many parallel file systems (PF-Ses), often due to their irregularity and ambiguity in time-based log sequences. To circumvent these problems, this study proposes ClusterLog, a log pre-processing method that clusters the temporal sequence of log keys based on their semantic similarity. By grouping semantically and sentimentally similar logs, this approach aims to represent log sequences with the smallest amount of unique log keys, intending to improve the ability for a downstream sequence-based model to effectively learn the log patterns. The preliminary results of ClusterLog indicate not only its effectiveness in reducing the granularity of log sequences without the loss of important sequence information but also its generalizability to different file systems' logs.
- Research Article
- 10.1142/s0129156424401165
- Sep 18, 2024
- International Journal of High Speed Electronics and Systems
In recent years, with the increasing complexity of software systems, logs have become crucial for system maintenance. Log-based anomaly detection plays a vital role in automatically detecting system anomalies through log analysis. However, current log-based anomaly detection approaches encounter significant practical challenges. Supervised methods often require a large amount of manually labeled training data, which can be time-consuming and costly to obtain. On the other hand, unsupervised and semi-supervised approaches may suffer from subpar performance, as they do not leverage historical anomalies to improve their detection capabilities. These challenges underscore the necessity for the development of more efficient and effective log-based anomaly detection methods. Database anomaly access detection is critical for ensuring the stability and security of database systems. We present a survey of existing log anomaly detection models and propose a novel approach, Template-Parsed Log Anomaly Detection (TPLAD) model, for automated anomaly detection in massive database log files. The proposed model combines the original log template with template parsing using code and text semantic representation. Experimental results demonstrate the effectiveness of the proposed approach in detecting abnormal database access patterns, including runtime errors, unauthorized access, and data leaks. The findings indicate that TPLAD model shows promise in enhancing database security and stability in business systems.
- Conference Article
7
- 10.23919/cnsm50824.2020.9269069
- Nov 2, 2020
Log-based anomaly detection is an important task for service management and system maintenance. Although anomaly labels are valuable to learn anomaly detection model, they are difficult to collect due to their rarity. To tackle this problem, existing methods employ domain adaptation algorithms to transfer anomaly detectors from labeled source domain to unlabeled target domain. However, most of those methods focus on key performance indicator anomaly detection. The semantic information in logs plays an important role in log-based anomaly detection. Therefore, adaptation methods need to consider how to transfer the semantic information in logs. In this paper, we propose a simple and effective adaptation method to transfer log-based anomaly detection model with pseudo labels. In our work, we first train a detection model with labeled samples as a pseudo-label annotator. Then we use it to assign pseudo-labels to unlabeled samples and train anomaly detectors as if they are true labels. Both models share the same feature extraction part, which can help model to transfer the semantic information in logs. We evaluated our proposed method on three log datasets. Our experimental results demonstrate that our method has outperformed other baseline methods.
- Research Article
36
- 10.1142/s0218194020500114
- Feb 1, 2020
- International Journal of Software Engineering and Knowledge Engineering
Logs play an important role in the maintenance of large-scale systems. The number of logs which indicate normal (normal logs) differs greatly from the number of logs that indicate anomalies (abnormal logs), and the two types of logs have certain differences. To automatically obtain faults by K-Nearest Neighbor (KNN) algorithm, an outlier detection method with high accuracy, is an effective way to detect anomalies from logs. However, logs have the characteristics of large scale and very uneven samples, which will affect the results of KNN algorithm on log-based anomaly detection. Thus, we propose an improved KNN algorithm-based method which uses the existing mean-shift clustering algorithm to efficiently select the training set from massive logs. Then we assign different weights to samples with different distances, which reduces the negative effect of unbalanced distribution of the log samples on the accuracy of KNN algorithm. By comparing experiments on log sets from five supercomputers, the results show that the method we proposed can be effectively applied to log-based anomaly detection, and the accuracy, recall rate and F measure with our method are higher than those of traditional keyword search method.
- Conference Article
650
- 10.1145/3338906.3338931
- Aug 12, 2019
Logs are widely used by large and complex software-intensive systems for troubleshooting. There have been a lot of studies on log-based anomaly detection. To detect the anomalies, the existing methods mainly construct a detection model using log event data extracted from historical logs. However, we find that the existing methods do not work well in practice. These methods have the close-world assumption, which assumes that the log data is stable over time and the set of distinct log events is known. However, our empirical study shows that in practice, log data often contains previously unseen log events or log sequences. The instability of log data comes from two sources: 1) the evolution of logging statements, and 2) the processing noise in log data. In this paper, we propose a new log-based anomaly detection approach, called LogRobust. LogRobust extracts semantic information of log events and represents them as semantic vectors. It then detects anomalies by utilizing an attention-based Bi-LSTM model, which has the ability to capture the contextual information in the log sequences and automatically learn the importance of different log events. In this way, LogRobust is able to identify and handle unstable log events and sequences. We have evaluated LogRobust using logs collected from the Hadoop system and an actual online service system of Microsoft. The experimental results show that the proposed approach can well address the problem of log instability and achieve accurate and robust results on real-world, ever-changing log data.
- Book Chapter
6
- 10.1007/978-3-030-91431-8_50
- Jan 1, 2021
With the development of software systems, log has become more and more important in system maintenance. During the past few years, log-based anomaly detection has attracted much attention. We propose a novel log-based anomaly detection model, called Sprelog, which captures “inconsistent” information during the evolution of log messages by exploring word-word interactions features. Firstly, we compute the interactive information of each word-word pair in the input log sequence, constructing self-matching attention vectors. Next, we use these self-matching attention vectors to manage the log sequence and construct the representation vectors. Hence, the log sequence can be matched word-by-word, adapting to the evolution of log messages. In addition, we combine pre-trained models in our proposed network to generate the higher-level semantic component information. More importantly, we use a low-rank bi-linear pooling approach to connect inconsistent and compositional information, thus our model can reduce potential information redundancy without weakening the discriminative ability. Experiment results on publicly available datasets demonstrate that our model significantly outperforms extant baselines on standard evaluation metrics, including precision, recall, F1 score and accuracy.
- Research Article
3
- 10.1007/s10664-025-10669-3
- Jun 23, 2025
- Empirical Software Engineering
Growth in system complexity increases the need for automated techniques dedicated to different log analysis tasks such as Log-based Anomaly Detection (LAD). The latter has been widely addressed in the literature, mostly by means of a variety of deep learning techniques. However, despite their many advantages, that focus on deep learning techniques is somewhat arbitrary as traditional Machine Learning (ML) techniques may perform well in many cases, depending on the context and datasets. In the same vein, semi-supervised techniques deserve the same attention as supervised techniques since the former have clear practical advantages. Further, current evaluations mostly rely on the assessment of detection accuracy. However, this is not enough to decide whether or not a specific ML technique is suitable to address the LAD problem in a given context. Other aspects to consider include training and prediction times as well as the sensitivity to hyperparameter tuning, which in practice matters to engineers. In this paper, we present a comprehensive empirical study, in which we evaluate a wide array of supervised and semi-supervised, traditional and deep ML techniques w.r.t. four evaluation criteria: detection accuracy, time performance, sensitivity of detection accuracy and time performance to hyperparameter tuning. Our goal is to provide much stronger and comprehensive evidence regarding the relative advantages and drawbacks of alternative techniques for LAD. The experimental results show that supervised traditional and deep ML techniques fare similarly in terms of their detection accuracy and prediction time on most of the benchmark datasets considered in our study. Moreover, overall, sensitivity analysis to hyperparameter tuning with respect to detection accuracy shows that supervised traditional ML techniques are less sensitive than deep learning techniques. Further, semi-supervised techniques yield significantly worse detection accuracy than supervised techniques.
- Research Article
25
- 10.1007/s10664-024-10533-w
- Aug 17, 2024
- Empirical Software Engineering
Software systems log massive amounts of data, recording important runtime information. Such logs are used, for example, for log-based anomaly detection, which aims to automatically detect abnormal behaviors of the system under analysis by processing the information recorded in its logs. Many log-based anomaly detection techniques based on deep learning models include a pre-processing step called log parsing. However, understanding the impact of log parsing on the accuracy of anomaly detection techniques has received surprisingly little attention so far. Investigating what are the key properties log parsing techniques should ideally have to help anomaly detection is therefore warranted. In this paper, we report on a comprehensive empirical study on the impact of log parsing on anomaly detection accuracy, using 13 log parsing techniques, seven anomly detection techniques (five based on deep learning and two based on traditional machine learning) on three publicly available log datasets. Our empirical results show that, despite what is widely assumed, there is no strong correlation between log parsing accuracy and anomaly detection accuracy, regardless of the metric used for measuring log parsing accuracy. Moreover, we experimentally confirm existing theoretical results showing that it is a property that we refer to as distinguishability in log parsing results—as opposed to their accuracy—that plays an essential role in achieving accurate anomaly detection.
- Conference Article
5
- 10.1109/scc55611.2022.00053
- Jul 1, 2022
Artificial Intelligence for IT Operations (AIOps) describes the process of maintaining and operating large IT systems using diverse AI-enabled methods and tools for, e.g., anomaly detection and root cause analysis, to support the remediation, optimization, and automatic initiation of self-stabilizing IT activities. The core step of any AIOps workflow is anomaly detection, typically performed on high-volume heterogeneous data such as log messages (logs), metrics (e.g., CPU utilization), and distributed traces. In this paper, we propose a method for reliable and practical anomaly detection from system logs. It overcomes the common disadvantage of related works, i.e., the need for a large amount of manually labeled training data, by building an anomaly detection model with log instructions from the source code of 1000+ GitHub projects. The instructions from diverse systems contain rich and heterogenous information about many different normal and abnormal IT events and serve as a foundation for anomaly detection. The proposed method, named ADLILog, combines the log instructions and the data from the system of interest (target system) to learn a deep neural network model through a two-phase learning procedure. The experimental results show that ADLILog outperforms the related approaches by up to 60% on the F <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> score while satisfying core non-functional requirements for industrial deployments such as unsupervised design, efficient model updates, and small model sizes.
- Conference Article
3
- 10.1109/qrs51102.2020.00022
- Dec 1, 2020
Log analysis can be used for software system anomaly detection, and ensemble learning can handle log data with imbalanced characteristics. Therefore, log-based anomaly detection with ensemble learning is a good choice. However, the existing data balancing methods used in ensemble learning may destroy the distribution of the original log data and affect the accuracy of the anomaly detection results. Besides, the existing ensemble rules do not take into account the relationship between the samples to be detected and the historical log data. Therefore, we propose a log-based anomaly detection method with the NW (Neighbor Weighting) ensemble rules, which uses a data balancing method based on spectral clustering so that the balanced log data can maintain the distribution of the original data and meet the quantity balance at the same time. Then, a new group of ensemble rules is proposed and used for anomaly detection with higher accuracy. We performed experiments on six large log data sets with different types of systems and verified the feasibility and universality of the method in this paper.
- Conference Article
2
- 10.1109/trustcom53373.2021.00096
- Oct 1, 2021
Logs, prevalent among nearly all computer systems, contain rich information that helps with troubleshooting or root cause analysis. Therefore, logs are excellent information sources for anomaly detection. Since logs are diverse and heterogeneous, to deal with all of them requires maintenance personnel to check detection results one by one, which is troublesome. This paper applies an ensemble method that combines the output of different results of log anomaly detection and generates a unified output to reduce the burden of maintenance personnel. Since logs are recorded according to user behavior, they often have non-fixed intervals. We apply a trust computational model to transform the unevenly distributed data into a regularly spaced time series. To our best knowledge, this is the first work to employ the trust in the data processing. Futhermore, there are some parameters introduced by the trust computational model. We take advantage of the parametric ensemble technique to address the issue of parameter choice and finally improve the accuracy of anomaly detection. In this way, our method allows one to track a user from multi-view by logs with ease. The experiment shows that our method could achieve a good performance in detecting anomalies of user behavior from multiple kinds of logs.
- Research Article
76
- 10.1109/tifs.2021.3053371
- Jan 1, 2021
- IEEE Transactions on Information Forensics and Security
Cloud technology has brought great convenience to enterprises as well as customers. System logs record notable events and are becoming valuable resources to track and investigate system status. Detecting anomaly from logs as fast as possible can improve the quality of service significantly. Although many machine learning algorithms (e.g., SVM, Logistic Regression) have high detection accuracy, we find that they assume data are clean and might have high training time. Facing these challenges, in this paper, we propose Robust Online Evolving Anomaly Detection (ROEAD) framework which adopts Robust Feature Extractor (RFE) to remove the effects of noise and Online Evolving Anomaly Detection (OEAD) to dynamic update parameters. We propose Online Evolving SVM (OES) algorithm as the example of online anomaly detection methods. We analyze the performance of OES in theory and prove the performance difference between OES and the best hypothesis tends to zero as time goes infinity. We compare the performance of ROEAD against state-of-the-art anomaly detection algorithms using public log datasets. The results demonstrate that ROEAD is able to remove the effects of noise and OES can improve the detection accuracy by more than 40%.