Abstract

Security devices produce huge number of logs which are far beyond the processing speed of human beings. This paper introduces an unsupervised approach to detecting anomalous behavior in large scale security logs. We propose a novel feature extracting mechanism and could precisely characterize the features of malicious behaviors. We design a LSTM-based anomaly detection approach and could successfully identify attacks on two widely-used datasets. Our approach outperforms three popular anomaly detection algorithms, one-class SVM, GMM and Principal Components Analysis, in terms of accuracy and efficiency.

Highlights

  • The running state of the system is usually recorded in a log file, used for debugging and fault detection, the log data is a valuable resource for anomaly detection

  • The feature vectors used by baselines can be equivalent to user-day level detection, it is shown that performance of Bidirectional Event Mode (BEM) model is better than that of baselines

  • Based on the analysis of the logs content in datasets, we build an anomaly detection model based on LSTM

Read more

Summary

Introduction

The running state of the system is usually recorded in a log file, used for debugging and fault detection, the log data is a valuable resource for anomaly detection. The traditional methods rely on the administrator to manually analyze the log text. The existing research shows that there is a strong correlation between logs and their character composition This model is based on LSTM sequence mining, through data-driven anomaly detection method, it can learn the sequence pattern of normal log, and detect unknown malicious behaviors, identify red team attacks in a large number of log sequences. Since no log correlation matching is performed, only abnormal log lines can be alerted, and the anomaly level of each user cannot be detected directly. The fourth section introduces the experiment, including a detailed description of the data set, test indicators, and comparison results compared with other methods such as one-class SVM, GMM and PCA

Anomaly Detection Approach
Log-Line Tokenization
LSTM Model
Related Work
Experimental Results
Metric
Baselines
Gaussian Mixture Model
Results and Analysis
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call