Abstract
Logs faithfully record application behaviors and system states. Log parsing converts unstructured log messages into structured event templates by extracting the constant portion of raw logs. Log parsing is a prerequisite for further log analysis such as usage analysis, anomaly detection, performance modeling, and failure diagnosis. When processing logs with varied length, log parsing suffers from accuracy decreasing or the over-fitting problem. In addition, traditional probability-based accuracy assessment methods are ineffective in assessing log parsing inner quality, especially in understanding the reason about accuracy declining caused by varied length logs. In this paper we present a p_value-guided inner quality assessment on multiple log parsing algorithms. This method uses conformal evaluation to gain a deep insight of log parser quality. In this method, we choose the string edit distance algorithm as underlying non-conformity measure for conformal evaluation. We introduce two quality indicators to evaluate log parsers: credibility and confidence. The credibility reflects how conformal a log message to a event template generated by a log parser whereas the confidence reflects how non-conformal this log message to all other event templates. In order to demonstrate the inherent difference among different log parsers, we display the distribution of credibility and confidence of each prediction on tSNE 2D space. In the experiment, we evaluate 13 log parsers on different datasets. The results show that our approach could effectively demonstrate the inherent quality of log parsers and recognize variable-length problem compared to traditional confusion matrix based metrics.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have