Execution anomaly detection in large-scale systems through console log analysis

Liang Bao,Qian Li,Peiyao Lu,Jie Lu,Tongxiao Ruan,Ke Zhang

doi:10.1016/j.jss.2018.05.016

Abstract

Execution anomaly detection is important for development, maintenance and performance tuning in large-scale systems. System console logs are the significant source of troubleshooting and problem diagnosis. However, manually inspecting logs to detect anomalies is unfeasible due to the increasing volume and complexity of log files. Therefore, this is a substantial demand for automatic anomaly detection based on log analysis. In this paper, we propose a general method to mine console logs to detect system problems. We first give some formal definitions of the problem, and then extract the set of log statements in the source code and generate the reachability graph to reveal the reachable relations of log statements. After that, we parse the log files to create log messages by combining information about log statements with information retrieval techniques. These messages are grouped into execution traces according to their execution units. We propose a novel anomaly detection algorithm that considers traces as sequence data and uses a probabilistic suffix tree based method to organize and differentiate significant statistical properties possessed by the sequences. Experiments on a CloudStack testbed and a Hadoop production system show that our method can effectively detect running anomalies in comparison with existing four detection algorithms.

Full Text