Try with Simpler – An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection

Lin Yang,Hongyu Zhang,Yue Kang,Shutao Gao,Junjie Chen,Zhihao Gong,Huaan Li

doi:10.1145/3644386

Lin Yang, Hongyu Zhang + Show 5 more

Open Access

https://doi.org/10.1145/3644386

Copy DOI

Abstract

With the rapid development of deep learning (DL), the recent trend of log-based anomaly detection focuses on extracting semantic information from log events (i.e., templates of log messages) and designing more advanced DL models for anomaly detection. Indeed, the effectiveness of log-based anomaly detection can be improved, but these DL-based techniques further suffer from the limitations of more heavy dependency on training data (such as data quality or data labels) and higher costs in time and resources due to the complexity and scale of DL models, which hinder their practical use. On the contrary, the techniques based on traditional machine learning or data mining algorithms are less dependent on training data and more efficient, but produce worse effectiveness than DL-based techniques which is mainly caused by the problem of unseen log events (some log events in incoming log messages are unseen in training data) confirmed by our motivating study. Intuitively, if we can improve the effectiveness of traditional techniques to be comparable with advanced DL-based techniques, log-based anomaly detection can be more practical. Indeed, an existing study in the other area (i.e., linking questions posted on Stack Overflow) has pointed out that traditional techniques with some optimizations can indeed achieve comparable effectiveness with the state-of-the-art DL-based technique, indicating the feasibility of enhancing traditional log-based anomaly detection techniques to some degree. Inspired by the idea of “try-with-simpler”, we conducted the first empirical study to explore the potential of improving traditional techniques for more practical log-based anomaly detection. In this work, we optimized the traditional unsupervised PCA (Principal Component Analysis) technique by incorporating a lightweight semantic-based log representation in it, called SemPCA , and conducted an extensive study to investigate the potential of SemPCA for more practical log-based anomaly detection. By comparing seven log-based anomaly detection techniques (including four DL-based techniques, two traditional techniques, and SemPCA ) on both public and industrial datasets, our results show that SemPCA achieves comparable effectiveness as advanced supervised/semi-supervised DL-based techniques while being much more stable under insufficient training data and more efficient, demonstrating that the traditional technique can still excel after small but useful adaptation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ACM Transactions on Software Engineering and Methodology	Publication Date: Feb 7, 2024
Citations: 1	License type: mit

R Discovery Prime

R Discovery Prime

Try with Simpler – An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Software Engineering and Methodology

Lead the way for us

Similar Papers

Robust log-based anomaly detection on unstable log data
Xu Zhang ... Furao Shen
-
Xu Zhang, et. al.Xu Zhang ... Furao Shen
12 Aug 2019
12 Aug 2019

Migrating From Data Mining to Big Data Mining
Gourav Bathla ... Rinkle Rani
International Journal of Engineering & Technology | VOL. 7
Gourav Bathla, et. al.Gourav Bathla ... Rinkle Rani
25 Jun 2018
International Journal of Engineering & Technology | VOL. 7

Impact of log parsing on deep learning-based anomaly detection
Zanis Ali Khan ... Lionel C Briand
Empirical Software Engineering | VOL. 29
Zanis Ali Khan, et. al.Zanis Ali Khan ... Lionel C Briand
17 Aug 2024
Empirical Software Engineering | VOL. 29

TPLAD: Template-Parsed Log Anomaly Detection for Electrical Database Systems
Hailong Li ... Chentao Zhang
International Journal of High Speed Electronics and Systems | VOL. -
Hailong Li, et. al.Hailong Li ... Chentao Zhang
18 Sep 2024
International Journal of High Speed Electronics and Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Try with Simpler – An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Software Engineering and Methodology