Abstract

Abstract Complex software systems experience failures at runtime even though a lot of effort is put into the development and operation. Reactive approaches detect these failures after they have occurred and already caused serious consequences. In order to execute proactive actions, the goal of online failure prediction is to detect these failures in advance by monitoring the quality of service or the system events. Current failure prediction approaches look at the system or individual components as a monolith without considering the architecture of the system. They disregard the fact that the failure in one component can propagate through the system and cause problems in other components. In this paper, we propose a hierarchical online failure prediction approach, called Hora , which combines component failure predictors with architectural knowledge. The failure propagation is modeled using Bayesian networks which incorporate both prediction results and component dependencies extracted from the architectural models. Our approach is evaluated using Netflix’s server-side distributed RSS reader application to predict failures caused by three representative types of faults: memory leak, system overload, and sudden node crash. We compare Hora to a monolithic approach and the results show that our approach can improve the area under the ROC curve by 9.9%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call