Problem Determination in Enterprise Middleware Systems using Change Point Correlation of Time Series Data

M.K Agarwal,N Anerousis,V Mann,M Gupta,N Sachindran,L Mummert

doi:10.1109/noms.2006.1687576

Abstract

Clustered enterprise middleware systems employing dynamic workload scheduling are susceptible to a variety of application malfunctions that can manifest themselves in a counterintuitive fashion and cause debilitating damage. Until now, diagnosing problems in that domain involves investigating log files and configuration settings and requires in-depth knowledge of the middleware architecture and application design. This paper presents a method for problem determination using change point detection techniques and problem signatures consisting of a combination of changes (or absence of changes) in different metrics. We implemented this approach on a clustered middleware system and applied it to the detection of the storm drain condition: a debilitating problem encountered in clustered systems with counterintuitive symptoms. Our experimental results show that the system detects 93% of storm drain faults with no false positives.

Full Text