Abstract

Log messages provide a valuable source of runtime information for ensuring the safety and consistency of systems. Recently, many machine learning and deep learning methods have been proposed to automatically detect anomalous log messages, obviating the need for manual detection by experts. However, we find that in practice, the effectiveness of existing learning-based methods is severely affected by incomplete information and distribution shift. Specifically, each log message can actually be parsed into a fixed number of key information fields, while existing methods analyze log messages using only the log event information and ignore other useful information fields that can be critical to anomaly detection. Further, the distribution of real-world log messages changes continuously due to the dynamic nature of the runtime environment and thus, a detection model conventionally trained based on the unrealistic i.i.d. assumption may not provide the expected and consistent performance. In this paper, we present a robust and transferable anomaly detection framework RT-Log to address the above problems. To perform a comprehensive analysis of log messages, we introduce an adaptive relation modeling technique, which captures feature interactions among log information fields selectively and dynamically for effective and interpretable log representations. To establish its robustness and transferability, we propose a general environment generalization technique for learning the environment invariant representations that can generalize across different runtime environments. We evaluate the anomaly detection performance of RT-Log on large real-world datasets. Extensive experimental results demonstrate that RT-Log consistently outperforms state-of-the-art methods by a significant margin under different settings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call