Abstract
Abstract Context A large amount of information about system behavior is stored in logs that record system changes. Such information can be exploited to discover anomalies of a system and the operations that cause them. Given their large size, manual inspection of logs is hard and infeasible in a desired timeframe (e.g., real-time), especially for critical systems. Objective This study proposes a semi-automated method for reconstructing sequences of tasks of a system, revealing system anomalies, and associating tasks and anomalies to code components. Method The proposed approach uses unsupervised machine learning (Latent Dirichlet Allocation) to discover latent topics in messages of log events and introduces a novel technique based on pattern recognition to derive the semantic of such topics (topic labelling). The approach has been applied to the big data generated by the ALMA telescope system consisting of more than 2000 log events collected in about five hours of telescope operation. Results With the application of our approach to such data, we were able to model the behavior of the telescope over 16 different observations. We found five different behavior models and three different types of errors. We use the models to interpret each error and discuss its cause. Conclusions With this work, we have also been able to discuss some of the known challenges in log mining. The experience we gather has been then summarized in lessons learned.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.