Abstract
Due to the continuous growth of transistors in processors, the cloud computing paradigm and increasingly complex environment, soft errors, which are a category of typical transient errors, have become an urgent challenge in ground-level systems. To handle these errors, making clear the error propagation is the key step. As programs become large, we cannot employ traditional fault injection campaigns to monitor every possible error. This paper proposes a method to study and model the propagation of soft errors within a program. Based on dynamic instructions traced in an error-free program execution, the ACE analysis classifies soft errors in architectural registers into benign ones and non-benign ones. The benign errors are considered to be derated in the propagation. Then we build a crash model and an improved DDG to analyze the propagation of each non-benign error and to predict its consequence (crash or silent data corruption). If the error is considered to cause a crash, the crash latency and the propagation path are also predicted. The method can be used to predict outcomes of programs under soft errors as well as occurrence of correct outputs, silent data corruptions or crashes. Extensive fault-injection experiments are provided to validate the proposed method from multiple perspectives.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.