Abstract

The context of this article is the availability of enterprise IT services, a key concern for many enterprises. While there is a plethora of literature concerned with service availability, there is no previous systematic empirical study on IT service time to recovery following outages. The existing literature typically assumes a distribution, or builds on analogies to related areas such as software engineering. Therefore, our objective is to find the statistical distribution of IT service time to recovery. Method-wise, this investigation is based on logs of more than 1800 incidents in a large Nordic bank, corresponding to more than 11000 hours of recorded downtime. Five possible distributions of time to recovery from the literature were investigated using the Akaike Information Criterion to find the distribution offering the best fit. The results show that the log-normal distribution outperformed the others for all tested service channels (collections of IT services). It is concluded that the log-normal distribution offers the best fit of IT service time to recovery. Using this distribution in simulation and decision-support tools offers the prospect of better predictions of downtime and downtime costs to the practitioner community.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call