Abstract
In past issues we have discussed various system-related disasters and their causes, both accidental and intentional. In almost all cases it is possible to allocate to people—directly or indirectly—those difficulties allegedly attributed to “computer problems.” But too much effort seems directed at placing blame and identifying scapegoats, and not enough on learning from experiences and avoiding such problems [1,2,5,6,7]. Besides, the real causes may implicitly or explicitly involve a multiplicity of developers, customers, users, operators, administrators, others involve with computer and communication systems, and sometimes even unsuspecting bystanders. In a few cases the physical environment also contributes, e.g., power outages, floods, extreme weather, lightning, and earthquakes. Even in those cases there may have been system people who failed to anticipate the possible effects. In principle, at least, we can design redundantly distributed systems that are able to withstand certain hardware faults, component unavailabilities, extreme delays, human errors, malicious misuse, and even “acts of God”—at least within limits. Nevertheless, in surprisingly many systems (including systems designed to provide continuous availability), an entire system can be brought to a screeching halt by a simple event just as by a complex one [4].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.