Abstract

Cloud is getting ubiquitous and scales up rapidly. It is critical to effectively detect and efficiently repair system anomalies for a robust cloud. Many efforts have been made to facilitate analysis of system problems with the readily-available and massive cloud logs. However, most tools can still not automatically recognize failures related to a specific cloud operating system task. To diagnose execution failures of a cloud, it is inevitable to monitor corresponding system tasks. In this paper, we propose a lightweight approach to identify cloud behaviors related to failed executions of the cloud operating system for failure diagnosis, by exploiting logs of ERROR logging level in a cloud. Instead of working on execution sequences extracted from logs for all system tasks, we focus on automated recognition of exception logs generated by a system task. These logs are critical snippets of execution traces for failure diagnosis of a cloud. In our work, exception logs are extracted and associated with the respective system task. Efforts can be reduced by comparing patterns of new error cloud behaviors with cloud behaviors met before. With experiments on OpenStack, a popular open source cloud operating system, we demonstrate that our work is effective and efficient for execution failure diagnosis of a cloud. Our approach can also be used as a complementary method for log-based troubleshooting tools concentrating on execution sequences.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.