Abstract

Logs can help developers to promptly diagnose software system failures. Log parsers, which parse semi-structured logs into structured log templates, are the first component for automated log analysis. However, almost all existing log parsers have poor generalization ability and only work well for specific systems. In addition, some parsers cannot perform well based on partial data training and cannot support out-of-vocabulary (OOV) words. These limitations can cause erroneous log parsing results. We observe that logs are presented as semi-structured natural language, and we can treat log parsing as a natural language processing task. Thus, we propose Semlog, a novel log parser, requiring no domain knowledge about specific systems. For a log, constant and variable words contribute differently to the semantics of a log. We pretrain a self-attention based model to craft their semantic contribution difference, and then extract log templates based on the pretrained model. We have conducted extensive experiments on 16 benchmark datasets, and the results show that Semlog outperforms the state-of-the-art parsers in terms of average parsing accuracy, reaching 0.987.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call