A strategy for soft error reduction in multi core designs

Ransford Hyman,Nagarajan Ranganathan,Koustav Bhattacharya

doi:10.1109/iscas.2009.5118238

Abstract

With the continous decrease in the minimum feature size and increase in the chip density, modern processors are being increasingly susceptible to soft errors. In the past, the technique of lockstep execution with redundant threads on duplicated pipelines have been used for soft error rate reduction which can achieve high error coverage but at the cost of large overheads in terms of area and performance. In this paper, we propose techniques for protection against soft errors in multi-core designs using (i) the properties of spatial and temporal redundancy and (ii) value based detection. We utilize temporal redundancy by using the “latency use slack” (LSC) of an instruction, which we de£ne as the number of cycles before the computed result from the instruction becomes the source operand of a subsequent instruction, while spatial redundancy is exploited by duplicating the instruction to a nearby idle processor core. Further, the value based detection technique is explored by exploiting the width of the operands with small data values and the generation of residue code check bits for the source operands. When a soft error is detected, error correction is achieved by rolling back the execution to a previous checkpoint state and re-executing the instructions. The proposed techniques have been implemented on the RSIM simulation framework and validated using the SPLASH benchmarks. Our results indicate that the soft error detection schemes proposed in this work, can be implemented, on average, with less than 10% increase in CPI on modern multi-core designs.

Full Text