Abstract

With redundant hardware, it is rare that a disk failure results in downtime at the system level. System failures do sometimes occur, typically as a sequence of very rare events that leads to a catastrophic failure. This case describes how a combination of hardware and firmware failures, along with human error, led to the failure of a redundant disk storage unit, which in turn affected several enterprise systems at a major public university. Subsequently, a small number of conservative and seemingly “good” decisions in the process of restoring the system from backups led to negative outcomes, primarily additional downtime over the course of several days. The case illustrates how even well-considered and conservative decisions may seem flawed in hindsight. An important lesson from the case is that it is difficult to justify to management the provision of sufficient backup resources to prevent very low-probability failure events.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.