XDrain: Effective log parsing in log streams using fixed-depth forest

Changjian Liu,Yang Tian,Siyu Yu,Donghui Gao,Yifan Wu,Suqun Huang,Xiaochun Hu,Ningjiang Chen

doi:10.1016/j.infsof.2024.107546

Abstract

Logs record rich information that can help operators diagnose system failure [1]. Analyzing logs in log streams can expedite the diagnostic process and effectively mitigate the impact of failures. Log parsing is a prerequisite for automated log analysis, which transforms semi-structured logs into structured logs. However, the effectiveness of existing parsers has only been evaluated on a limited set of logs, which lack sufficient log types. After conducting a more comprehensive evaluation of the existing log parser, we identified the following deficiencies: (1) Variable-starting logs can make some log parsers error-prone. (2) The order of logs in a log stream can have a great impact on the effectiveness. We proposes XDrain to satisfy these challenges by using fixed-depth forest. XDrain first shuffles the order of logs and the order of words within each log a few times. Secondly, XDrain will generate parsing forest for all the logs generated after the shuffling. Finally, the final log template is generated by voting. Evaluation results show that XDrain outperforms existing log parsers on two widely-used accuracy metrics and is immune to inappropriate log order. XDrain only takes about 97.89 s to parse one million logs on average.

Full Text