Abstract

Log data analysis is an essential task when it comes to understanding a computer's or a network's system behavior, and enables security analysis, fault diagnosis, performance analysis, or intrusion detection. An established technique for log analysis is log line clustering, which allows to group similar events and to detect outliers, malicious clusters or changes in system behavior. However, log line clusters usually lack meaningful descriptions that are required to understand the information provided by log lines within a cluster. Template generators allow to produce such descriptions in form of patterns that match all log lines within a cluster and therefore describe the common features of the lines. Current approaches only allow generation of token-based (e.g., space-separated words) templates, which are often inaccurate, because they do not recognize words that can be spelled differently as similar and require further information on the structure and syntax of the data, such as predefined delimiters. Consequently, novel character-based template generators are required that provide robust templates for any type of computer log data, which can be applied in security information and event management (SIEM) solutions, for continuous auditing, quality inspection and control. In this paper, we propose a novel approach for computing character-based templates, which combines comparison-based methods and heuristics. To achieve this goal, we solve the problem of efficiently calculating a multi-line alignment for a group of log lines and compute an accurate approximation of the optimal character-based template, while reducing the runtime from $O(n^m)$ to $O(mn^2)$. We demonstrate the accuracy of our approach in a detailed evaluation, applying a newly introduced measure for accuracy, the Sim-Score, which can be computed independently from a ground truth, and the established F-Score. Furthermore, we assess the robustness of the algorithm and the influence of different log data properties on the quality of the resulting templates.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call