Abstract

Clustered enterprise middleware systems employing dynamic workload scheduling are susceptible to a variety of application malfunctions that can manifest themselves in a counterintuitive fashion and cause debilitating damage. Until now, diagnosing problems in that domain involves investigating log files and configuration settings and requires in-depth knowledge of the middleware architecture and application design. This paper presents a method for problem determination using change point detection techniques and problem signatures consisting of a combination of changes (or absence of changes) in different metrics. We implemented this approach on a clustered middleware system and applied it to the detection of the storm drain condition: a debilitating problem encountered in clustered systems with counterintuitive symptoms. Our experimental results show that the system detects 93% of storm drain faults with no false positives.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call