Self-supervised log parsing using semantic contribution difference

Siyu Yu,Ningjiang Chen,Yifan Wu,Wensheng Dou

doi:10.1016/j.jss.2023.111646

Abstract

Logs can help developers to promptly diagnose software system failures. Log parsers, which parse semi-structured logs into structured log templates, are the first component for automated log analysis. However, almost all existing log parsers have poor generalization ability and only work well for specific systems. In addition, some parsers cannot perform well based on partial data training and cannot support out-of-vocabulary (OOV) words. These limitations can cause erroneous log parsing results. We observe that logs are presented as semi-structured natural language, and we can treat log parsing as a natural language processing task. Thus, we propose Semlog, a novel log parser, requiring no domain knowledge about specific systems. For a log, constant and variable words contribute differently to the semantics of a log. We pretrain a self-attention based model to craft their semantic contribution difference, and then extract log templates based on the pretrained model. We have conducted extensive experiments on 16 benchmark datasets, and the results show that Semlog outperforms the state-of-the-art parsers in terms of average parsing accuracy, reaching 0.987.

Full Text