Markov Chain Modeling for Anomaly Detection in High Performance Computing System Logs

Abida Haque,Alexandra Delucia,Elisabeth Baseman

doi:10.1145/3152493.3152559

Abstract

As high performance computing approaches the exascale era, analyzing the massive amount of monitoring data generated by supercomputers is quickly becoming intractable for human analysts. In particular, system logs, which are a crucial source of information regarding machine health and root cause analysis of problems and failures, are becoming far too large for a human to review by hand. We take a step toward mitigating this problem through mathematical modeling of textual system log data in order to automatically capture normal behavior and identify anomalous and potentially interesting log messages. We learn a Markov chain model from average case system logs and use it to generate synthetic system log data. We present a variety of evaluation metrics for scoring similarity between the synthetic logs and the real logs, thus defining and quantifying normal behavior. Then, we explore the abilities of this learned model to identify anomalous behavior by evaluating its ability to catch inserted and missing log messages. We evaluate our model and its performance on the anomaly detection task using a large set of system log files from two institutional computing clusters at Los Alamos National Laboratory. We find that while our model seems to pick up on key features of normal behavior, its ability to detect anomalies varies greatly by anomaly type and the training and test data used. Overall, we find mathematical modeling of system logs to be a promising area for further work, particularly with the goal of aiding human operators in troubleshooting tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Markov Chain Modeling for Anomaly Detection in High Performance Computing System Logs

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

The Class Overlap Model for System Log Anomaly Detection Based on Ensemble Learning
Yitong Ren ... Zhaojun Gu
-
Yitong Ren, et. al.Yitong Ren ... Zhaojun Gu
01 Jul 2020
01 Jul 2020

Converting Unstructured System Logs into Structured Event List for Anomaly Detection
Zongze Li ... Song Fu
-
Zongze Li, et. al.Zongze Li ... Song Fu
27 Aug 2018
27 Aug 2018

Enhancing HPC System Log Analysis by Identifying Message Origin in Source Code
Megan Hickman ... Sean Blanchard
-
Megan Hickman, et. al.Megan Hickman ... Sean Blanchard
01 Oct 2018
01 Oct 2018

Event Block Identification and Analysis for Effective Anomaly Detection to Build Reliable HPC Systems
Zongze Li ... Sean Blanchard
-
Zongze Li, et. al.Zongze Li ... Sean Blanchard
01 Jun 2018
01 Jun 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Markov Chain Modeling for Anomaly Detection in High Performance Computing System Logs

Abstract

Talk to us

Similar Papers