Abstract

Failures in software systems during operation are inevitable. They cause system downtime, which needs to be minimized to reduce or avoid unnecessary costs and customer dissatisfaction. Online failure prediction aims at identifying upcoming failures at runtime to enable proactive maintenance actions. Existing online failure prediction approaches focus on predicting failures of either individual components or the system as a whole. They do not take into account software architectural dependencies, which determine the propagation of failures. In this paper, we propose a hierarchical online failure prediction approach, HORA, which employs a combination of both failure predictors and architectural models. We evaluate our approach using a distributed RSS reader application by Netflix and investigate the prediction quality for two representative types of failures, namely memory leak and system overload. The results show that, overall, our approach improves the area under the ROC curve by 10.7% compared to a monolithic approach.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call