As General-Purpose Graphics Processing Units (GPGPUs) become pervasive for the High-Performance Computing (HPC), ensuring that programs can be protected from soft errors has become increasingly important. Soft errors may cause Silent Data Corruptions (SDCs), which produces erroneous execution results silently. Due to the massive parallelism of GPGPUs, fully protecting them against soft errors introduces nontrivial overhead. Fortunately, imprecise execution outcomes are inherently tolerable for some HPC programs due to the nature of these applications. Leveraging the feature, selective soft error protection can be applied to reduce energy consumptions.In this work, we first propose a GPGPU-based Soft-Error aware APproximation analysis framework (G-SEAP) to characterize the approximation characteristics of soft errors. Based on G-SEAP, we perform an exhaustive analysis for 17 representative HPC benchmarks and observe 72.7% of SDCs on average are approximable. We also observe that the dataflow of application, kernel function reliability requirement, instruction-type, and data bit-location are all important factors for program’s correctness. Lastly, according to the observations, we further design an approximate Error Correction Codes (ECCs) mechanism and an approximate instruction duplication technique to illustrate how G-SEAP provides useful guidance for energy-efficient soft-error elimination in GPGPUs.
Read full abstract