Abstract

The central theme of the article is to provide a better knowledge of software system failures and how to assure, maintain, and provide the support software systems that are in production. It includes the results of our search study. We conducted a qualitative analysis of thirty cases: fifteen from public incident reports and fifteen from in-depth interviews with engineers. Understanding and classifying failures as well as their identification, investigation, and mitigation were the main goals of our study. Furthermore, we obtained important analytical insights that are pertinent to the condition of practice as it is now and related problems. It is common for engineers to be unaware of the scaling limitations of the systems they support until those limits are exceeded, and failures have the potential to cascade across a system and cause catastrophic outages.We argue that the difficulties we've discovered may lead to changes in how systems are designed and supported.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call