Abstract

Reliability has emerged as a key topic of interest for researchers around the world to detect and/or mitigate the side effects of decreasing transistor sizes, such as soft errors. Traditional solutions, like DMR and TMR, incur significant area and power overheads, which might not always be applicable due to power restrictions. Therefore, we investigate alternative heterogeneous reliability modes that can be activated at run-time based on the system requirements, while reducing the power and area overheads of the processor. Our heterogeneous reliability modes are successful in reducing the processor vulnerability by 87% on average, with area and power overheads of 10% and 43%, respectively. To further enhance the design space of heterogeneous reliability, we investigate combinations of efficient compression techniques like Distributed Multi-threaded Checkpointing, Hash-based Incremental Checkpointing, and GNU zip, to reduce the storage requirements of data that are backed-up at an application checkpoint. We have successfully reduced checkpoint sizes by a factor ~6x by combining various state compression techniques. We use gem5 to implement and simulate the state compression techniques and the heterogeneous reliability modes discussed in this paper.

Highlights

  • Aggressive transistor scaling has led to an increased susceptibility towards several reliability problems, such as soft errors, at the hardware layer [1]

  • We extend the concept of Architectural Vulnerability Factor (AVF) towards the Full-Processor Vulnerability Factor (FPVF) metric to evaluate the impact of component hardening on the reliability mode, for a given application workload

  • In this work, we presented a novel architectural-space generation and exploration methodology that is used to develop a wide range of heterogeneous reliability modes for out-oforder superscalar processors

Read more

Summary

INTRODUCTION

Aggressive transistor scaling has led to an increased susceptibility towards several reliability problems, such as soft errors, at the hardware layer [1]. We propose to harden a combination of the key pipeline components in out-of-order superscalar processors, instead of employing full-scale TMR across the complete pipeline, to increase core reliability while reducing the area and power overheads of full-scale TMR This generates a design space of multiple heterogeneous reliability modes (RM), nine of which are illustrated in this work (and unprotected core). Key components like Rename Map (RM) and Reorder Buffer (ROB) effectively reduce the FPVF for all applications, as shown by the heterogeneous reliability modes RM4, RM7 and RM8 Utilizing these hardening modes incurs significant area and power overheads. In this sub-section, we present a brief overview of a run-time system for our proposed heterogeneous multicore processor that aims at selecting the set of Pareto-optimal modes for cores such that the vulnerability of their respective applications can be minimized while satisfying their power constraints. Size of HBICT+DMTCP is 1.03× larger than the file size of DMTCP, the effectiveness of the combined state compression technique (DMTCP+HBICT+gzip), with respect to DMTCP, is reduced

RELATED WORK
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.