Architectural-Space Exploration of Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors

Bharath Srinivas Prabakaran,Semeen Rehman,Mihika Dave,Florian Kriebel,Muhammad Shafique

doi:10.1109/access.2019.2945622

Bharath Srinivas Prabakaran, Semeen Rehman + Show 3 more

Open Access

https://doi.org/10.1109/access.2019.2945622

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 43	License type: CC BY 4.0

Affiliation: TU Wien, University of Illinois Urbana-Champaign

Abstract

Reliability has emerged as a key topic of interest for researchers around the world to detect and/or mitigate the side effects of decreasing transistor sizes, such as soft errors. Traditional solutions, like DMR and TMR, incur significant area and power overheads, which might not always be applicable due to power restrictions. Therefore, we investigate alternative heterogeneous reliability modes that can be activated at run-time based on the system requirements, while reducing the power and area overheads of the processor. Our heterogeneous reliability modes are successful in reducing the processor vulnerability by 87% on average, with area and power overheads of 10% and 43%, respectively. To further enhance the design space of heterogeneous reliability, we investigate combinations of efficient compression techniques like Distributed Multi-threaded Checkpointing, Hash-based Incremental Checkpointing, and GNU zip, to reduce the storage requirements of data that are backed-up at an application checkpoint. We have successfully reduced checkpoint sizes by a factor ~6x by combining various state compression techniques. We use gem5 to implement and simulate the state compression techniques and the heterogeneous reliability modes discussed in this paper.

Highlights

Aggressive transistor scaling has led to an increased susceptibility towards several reliability problems, such as soft errors, at the hardware layer [1]
We extend the concept of Architectural Vulnerability Factor (AVF) towards the Full-Processor Vulnerability Factor (FPVF) metric to evaluate the impact of component hardening on the reliability mode, for a given application workload
In this work, we presented a novel architectural-space generation and exploration methodology that is used to develop a wide range of heterogeneous reliability modes for out-oforder superscalar processors

Summary

INTRODUCTION

Aggressive transistor scaling has led to an increased susceptibility towards several reliability problems, such as soft errors, at the hardware layer [1]. We propose to harden a combination of the key pipeline components in out-of-order superscalar processors, instead of employing full-scale TMR across the complete pipeline, to increase core reliability while reducing the area and power overheads of full-scale TMR This generates a design space of multiple heterogeneous reliability modes (RM), nine of which are illustrated in this work (and unprotected core). Key components like Rename Map (RM) and Reorder Buffer (ROB) effectively reduce the FPVF for all applications, as shown by the heterogeneous reliability modes RM4, RM7 and RM8 Utilizing these hardening modes incurs significant area and power overheads. In this sub-section, we present a brief overview of a run-time system for our proposed heterogeneous multicore processor that aims at selecting the set of Pareto-optimal modes for cores such that the vulnerability of their respective applications can be minimized while satisfying their power constraints. Size of HBICT+DMTCP is 1.03× larger than the file size of DMTCP, the effectiveness of the combined state compression technique (DMTCP+HBICT+gzip), with respect to DMTCP, is reduced

RELATED WORK

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Architectural-Space Exploration of Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Fault-Tolerant Computing with Heterogeneous Hardening Modes
Florian Kriebel ... Bharath Srinivas Prabakaran
-
Florian Kriebel, et. al.Florian Kriebel ... Bharath Srinivas Prabakaran
10 Dec 2020
10 Dec 2020

A Low-Cost, Systematic Methodology for Soft Error Robustness of Logic Circuits
Kai-Chiang Wu ... Diana Marculescu
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 21
Kai-Chiang Wu, et. al.Kai-Chiang Wu ... Diana Marculescu
01 Feb 2013
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 21

A New Protection Technique for Finite Impulse Response (FIR) Filters in the Presence of Soft Errors
P Reyes ... P Reviriego
-
P Reyes, et. al.P Reyes ... P Reviriego
01 Jun 2007
01 Jun 2007

A Knapsack Methodology for Hardware-based DMR Protection against Soft Errors in Superscalar Out-of-Order Processors
Rafael Billig Tonetto ... Gabriel L Nazar
-
Rafael Billig Tonetto, et. al.Rafael Billig Tonetto ... Gabriel L Nazar
01 Oct 2019
01 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Architectural-Space Exploration of Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access