Voltage Guardbands Research Articles

On-chip memory (usually based on Static RAMs-SRAMs) are crucial components for various computing devices including heterogeneous devices, e.g, GPUs, FPGAs, ASICs to achieve high performance. Modern workloads such as Deep Neural Networks (DNNs) running on these heterogeneous fabrics are highly dependent on the on-chip memory architecture for efficient acceleration. Hence, improving the energy-efficiency of such memories directly leads to an efficient system. One of the common methods to save energy is undervolting i.e., supply voltage underscaling below the nominal level. Such systems can be safely undervolted without incurring faults down to a certain voltage limit. This safe range is also called voltage guardband. However, reducing voltage below the guardband level without decreasing frequency causes timing-based faults. In this paper, we propose MoRS, a framework that generates the first approximate undervolting fault model using real faults extracted from experimental undervolting studies on SRAMs to build the model. We inject the faults generated by MoRS into the on-chip memory of the DNN accelerator to evaluate the resilience of the system under the test. MoRS has the advantage of simplicity without any need for high-time overhead experiments while being accurate enough in comparison to a fully randomly-generated fault injection approach. We evaluate our experiment in popular DNN workloads by mapping weights to SRAMs and measure the accuracy difference between the output of the MoRS and the real data. Our results show that the maximum difference between real fault data and the output fault model of MoRS is 6.21%, whereas the maximum difference between real data and random fault injection model is 23.2%. In terms of average proximity to the real data, the output of MoRS outperforms the random fault injection approach by 3.21x.

Read full abstract

The energy efficiency of GPU architectures has emerged as an essential aspect of computer system design. In this article, we explore the energy benefits of reducing the GPU chip's voltage to the safe limit, i.e., V min point, using predictive software techniques. We perform such a study on several commercial off-the-shelf GPU cards. We find that there exists about 20% voltage guardband on those GPUs spanning two architectural generations, which, if “eliminated” entirely, can result in up to 25% energy savings on one of the studied GPU cards. Our measurement results unveil a program dependent V min behavior across the studied applications, and the exact improvement magnitude depends on the program's available guardband. We make fundamental observations about the program-dependent V min behavior. We experimentally determine that the voltage noise has a more substantial impact on V min compared to the process and temperature variation, and the activities during the kernel execution cause large voltage droops. From these findings, we show how to use kernels' microarchitectural performance counters to predict its V min value accurately. The average and maximum prediction errors are 0.5% and 3%, respectively. The accurate V min prediction opens up new possibilities of a crosslayer dynamic guardbanding scheme for GPUs, in which software predicts and manages the voltage guardband, while the functional correctness is ensured by a hardware safety net mechanism.

Read full abstract

Voltage Guardbands Research Articles

Related Topics

Articles published on Voltage Guardbands

MoRS: An Approximate Fault Modeling Framework for Reduced-Voltage SRAMs

Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins

Predictive Guardbanding: Program-Driven Timing Margin Reduction for GPUs

A Unified Clock and Switched-Capacitor-Based Power Delivery Architecture for Variation Tolerance in Low-Voltage SoC Domains

DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications

Aggressive Voltage and Temperature Control for Power Saving in Mobile Application Processors

Edge-TM

Postsilicon Voltage Guard-Band Reduction in a 22 nm Graphics Execution Core Using Adaptive Voltage Scaling and Dynamic Power Gating

A Low-Power 1-GHz Razor FIR Accelerator With Time-Borrow Tracking Pipeline and Approximate Error Correction in 65-nm CMOS

Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors

Circuit-Level Timing Error Tolerance for Low-Power DSP Filters and Transforms

The Next Generation 64b SPARC Core in a T4 SoC Processor

VRSync

Pushing Adaptive Voltage Scaling Fully on Chip

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Voltage Guardbands Research Articles

Related Topics

Articles published on Voltage Guardbands

MoRS: An Approximate Fault Modeling Framework for Reduced-Voltage SRAMs

Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins

Predictive Guardbanding: Program-Driven Timing Margin Reduction for GPUs

A Unified Clock and Switched-Capacitor-Based Power Delivery Architecture for Variation Tolerance in Low-Voltage SoC Domains

DNN Engine: A 28-nm Timing-Error Tolerant Sparse Deep Neural Network Processor for IoT Applications

Aggressive Voltage and Temperature Control for Power Saving in Mobile Application Processors

Edge-TM

Postsilicon Voltage Guard-Band Reduction in a 22 nm Graphics Execution Core Using Adaptive Voltage Scaling and Dynamic Power Gating

A Low-Power 1-GHz Razor FIR Accelerator With Time-Borrow Tracking Pipeline and Approximate Error Correction in 65-nm CMOS

Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors

Circuit-Level Timing Error Tolerance for Low-Power DSP Filters and Transforms

The Next Generation 64b SPARC Core in a T4 SoC Processor

VRSync

Pushing Adaptive Voltage Scaling Fully on Chip