Abstract

Accurate, timely Root Cause Analysis (RCA) is essential to successful IT operations as a primary step to incident remediation. RCA automation using data mining techniques in large heterogeneous systems is, however, a challenging task, because it requires correlating multimodal information across various data sources. An increasing number of services are migrating to structured logging to enable automated monitoring and debugging of complex large-scale systems. In this paper, we leverage structured logs and association rule mining (ARM) to automate RCA. We propose the LogRule algorithm, which automatically analyzes structured logs to generate a list of explanations for an event of interest. It achieves 0.921 F1-score for the diagnosis task, while computing results 37x faster compared to the state-of-the-art solution based on FP-growth, making it a time-efficient, accurate, and interpretable ARM-based RCA algorithm. Evaluation results show that LogRule enables RCA in complex multidimensional datasets, where the execution time of the current state-of-the-art algorithm is prohibitively large.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call