Redundancy Mining for Soft Error Detection in Multicore Processors

Ransford Hyman,Nagarajan Ranganathan,Koustav Bhattacharya

doi:10.1109/tc.2010.168

Abstract

The trends in technology scaling and the reduction in supply voltages have significantly improved the performance and energy consumption in modern microprocessors. Microprocessors are being built with higher degrees of spatial parallelism and deeper pipelines to improve performance, which, however, makes them more susceptible to transient faults. Radiation causes "transient faults” or "single-event transients” in logic, which, once propagated and latched, become full cycle errors or soft errors. If radiation hits memory elements, this is usually called an "single-event upset” or "soft error” as it can further propagate as a full cycle error. The problem of soft errors is further exacerbated in large multiprocessors employed in servers in which reliability is a key concern. In the past, the technique of lockstep execution of the original and the duplicate instructions has been used for error detection in multiprocessors. However, the execution of redundant threads in the on-chip multiprocessor (CMP) provides error detection at lower overheads, since the branch outcomes of the leading thread can be exploited during the execution of the trailing thread, and also because the interprocessor communication latency is a key concern for lockstepping. In this paper, we show that by mining various redundancies inherent within a single core, the interprocessor communication can be brought down to a minimum. Toward this, we propose techniques based on 1) temporal redundancy, 2) data value redundancy, and 3) information redundancy for error detection in multicore designs. We exploit temporal redundancy by using the "latency slack cycles” (LSC) of an instruction, which we define as the number of cycles before the computed result from the instruction becomes the source operand of a subsequent instruction. The value-based detection technique is explored by exploiting the width of the operands with small data values and information redundancy is exploited by the generation of residue code check bits for the source operands. We show that with a clustered core multiprocessor, the interprocessor communication overhead can be significantly reduced. In our proposed multicore design, when a soft error is detected, error correction is achieved by rolling back the execution to a previous checkpoint state and re-executing the instructions. The proposed techniques have been implemented on the RSIM simulation framework and validated using the SPLASH benchmarks. Experimental results indicate that the soft error detection schemes proposed in this work, can be implemented, on the average, with less than 10 percent increase in CPI on modern multicore designs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Redundancy Mining for Soft Error Detection in Multicore Processors

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers

Lead the way for us

Journal: IEEE Transactions on Computers	Publication Date: Aug 1, 2011
Citations: 58

Similar Papers

A Soft Error Detection and Recovery Flip-Flop for Aggressive Designs With High-Performance
Jie Li ... Chenxu Wang
IEEE Transactions on Device and Materials Reliability | VOL. 22
Jie Li, et. al.Jie Li ... Chenxu Wang
01 Jun 2022
IEEE Transactions on Device and Materials Reliability | VOL. 22

A strategy for soft error reduction in multi core designs
Ransford Hyman ... Nagarajan Ranganathan
-
Ransford Hyman, et. al.Ransford Hyman ... Nagarajan Ranganathan
01 May 2009
01 May 2009

Scan-Architecture-Based Evaluation Technique of SET and SEU Soft-Error Rates at Each Flip-Flop in Logic VLSI Systems
Yoshimitsu Yanagawa ... Hirokazu Ikeda
IEEE Transactions on Nuclear Science | VOL. 55
Yoshimitsu Yanagawa, et. al.Yoshimitsu Yanagawa ... Hirokazu Ikeda
01 Aug 2008
IEEE Transactions on Nuclear Science | VOL. 55

AUDITOR: A Stage-Wise Soft-Error Detection Scheme for Flip-flop Based Pipelines
Hong Zhang ... Hongfeng Sun
-
Hong Zhang, et. al.Hong Zhang ... Hongfeng Sun
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Redundancy Mining for Soft Error Detection in Multicore Processors

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers