Abstract
Since the first use of computers in spacecraft and aircraft, software errors have occurred. These errors can manifest as loss of life or less catastrophically. As the demand for automation increases, software in mission- or safety-critical systems should be designed to be tolerant to the most likely software faults. This paper categorizes historic aerospace software errors to determine trends of how and where automation is most likely to fail. A distinction between software producing wrong (erroneous) output versus no output (fail-silent) is introduced. Of the historical incidents analyzed, 85% were from software producing erroneous output rather than stopping. Rebooting was found to be ineffective in clearing erroneous behavior and not reliable to recover from silent software. Errors originated from within the code/logic itself in 58% of cases, 16% from configurable data, and 25% introduced through input sources, command or sensor. Forty percent of unexpected software behavior was caused by the absence of software, and 16% was subjectively due to “unknown-unknowns.” These findings indicate that to achieve software fault tolerance, backup strategies must be employed to detect and respond to erroneous software behavior beyond only fail-silent cases, and robust off-nominal testing should be performed to uncover unanticipated situations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.