Abstract

As technology scales ever further, device unreliability is creating excessive complexity for hardware to maintain the illusion of perfect operation. In this paper, we consider whether exposing hardware fault information to software and allowing software to control fault recovery simplifies hardware design and helps technology scaling. The combination of emerging applications and emerging many-core architectures makes software recovery a viable alternative to hardware-based fault recovery. Emerging applications tend to have few I/O and memory side-effects , which limits the amount of information that needs checkpointing, and they allow discarding individual sub-computations with small qualitative impact. Software recovery can harness these properties in ways that hardware recovery cannot. We describe Relax, an architectural framework for software recovery of hardware faults. Relax includes three core components: (1) an ISA extension that allows software to mark regions of code for software recovery, (2) a hardware organization that simplifies reliability considerations and provides energy efficiency with hardware recovery support removed, and (3) software support for compilers and programmers to utilize the Relax ISA. Applying Relax to counter the effects of process variation, our results show a 20% energy efficiency improvement for PARSEC applications with only minimal source code changes and simpler hardware.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.