Towards Automated Log Parsing for Large-Scale Log Data Analysis

Pinjia He,Shilin He,Jieming Zhu,Michael R Lyu,Jian Li

doi:10.1109/tdsc.2017.2762673

Abstract

Logs are widely used in system management for dependability assurance because they are often the only data available that record detailed system runtime behaviors in production. Because the size of logs is constantly increasing, developers (and operators) intend to automate their analysis by applying data mining methods, therefore structured input data (e.g., matrices) are required. This triggers a number of studies on log parsing that aims to transform free-text log messages into structured events. However, due to the lack of open-source implementations of these log parsers and benchmarks for performance comparison, developers are unlikely to be aware of the effectiveness of existing log parsers and their limitations when applying them into practice. They must often reimplement or redesign one, which is time-consuming and redundant. In this paper, we first present a characterization study of the current state of the art log parsers and evaluate their efficacy on five real-world datasets with over ten million log messages. We determine that, although the overall accuracy of these parsers is high, they are not robust across all datasets. When logs grow to a large scale (e.g., 200 million log messages), which is common in practice, these parsers are not efficient enough to handle such data on a single computer. To address the above limitations, we design and implement a parallel log parser (namely POP) on top of Spark, a large-scale data processing platform. Comprehensive experiments have been conducted to evaluate POP on both synthetic and real-world datasets. The evaluation results demonstrate the capability of POP in terms of accuracy, efficiency, and effectiveness on subsequent log mining tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards Automated Log Parsing for Large-Scale Log Data Analysis

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Dependable and Secure Computing

Lead the way for us

Journal: IEEE Transactions on Dependable and Secure Computing	Publication Date: Nov 1, 2018
Citations: 187

Similar Papers

Investigating and improving log parsing in practice
Ying Fu ... Zhongxin Liu
-
Ying Fu, et. al.Ying Fu ... Zhongxin Liu
07 Nov 2022
07 Nov 2022

A Confidence-Guided Evaluation for Log Parsers Inner Quality
Xueshuo Xie ... Xuhang Xiao
Mobile Networks and Applications | VOL. 26
Xueshuo Xie, et. al.Xueshuo Xie ... Xuhang Xiao
14 Jan 2020
Mobile Networks and Applications | VOL. 26

An Evaluation Study on Log Parsing and Its Use in Log Mining
Pinjia He ... Shilin He
-
Pinjia He, et. al.Pinjia He ... Shilin He
01 Jun 2016
01 Jun 2016

UniParser: A Unified Log Parser for Heterogeneous Log Data
Yudong Liu ... Saravan Rajmohan
-
Yudong Liu, et. al.Yudong Liu ... Saravan Rajmohan
25 Apr 2022
25 Apr 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Automated Log Parsing for Large-Scale Log Data Analysis

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Dependable and Secure Computing