Abstract

Hardware faults (i.e., soft errors) are projected to increase in modern HPC systems. The faults often lead to error propagation in programs and result in silent data corruptions (SDCs), seriously compromising system reliability. Selective instruction duplication, a widely used software-based error detector, has been shown to be effective in detecting SDCs with low performance overhead. In the past, researchers have relied on compiler intermediate representation (IR) for program reliability analysis and code transformation in selective instruction duplication. However, they assumed that the IR-based analysis and protection are representative under realistic fault models (i.e., faults originated at lower hardware layers). Unfortunately, the assumptions have not been fully validated, leading to questions about the accuracy and efficiency of the protection since IR is a higher level of abstraction and far away from hardware layers. In this paper, we verify the assumption by injecting realistic hardware faults to programs that are guided and protected by IR-based selective instruction duplication. We find that the protection yields high SDC coverage with low performance overhead even under realistic fault models, albeit a small amount of such faults escaping the detector. Our observations confirm that IR-based selective instruction duplication is a cost-effective method to protect programs from soft errors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.