Abstract

The ever-increasing parallelism demand of General-Purpose Graphics Processing Unit (GPGPU) applications pushes toward larger and more energy-hungry register files in successive GPU generations. Reducing the supply voltage beyond its safe limit is an effective way to improve the energy efficiency of register files. However, at these operating voltages, the reliability of the circuit is compromised. This work aims to tolerate permanent faults from process variations in large GPU register files operating below the safe supply voltage limit. To do so, this paper proposes a microarchitectural patching technique, DC-Patch, exploiting the inherent data redundancy of applications to compress registers at run-time with neither compiler assistance nor instruction set modifications. Instead of disabling an entire faulty register file entry, DC-Patch leverages the reliable cells within a faulty entry to store compressed register values. Experimental results show that, with more than a third of faulty register entries, DC-Patch ensures a reliable operation of the register file and reduces the energy consumption by 47% with respect to a conventional register file working at nominal supply voltage. The energy savings are 21% compared to a voltage noise smoothing scheme operating at the safe supply voltage limit. These benefits are obtained with less than 2 and 6% impact on the system performance and area, respectively.

Highlights

  • For more than a decade, Graphics Processing Units (GPUs) have established themselves as a massively parallel computing device and have been adopted in multiple computing areas, from embedded systems to high-performance data centers

  • Experimental results show that DC-Patch guarantees a reliable operation of a register file with 39% of faulty register entries

  • In GPUs, supply voltage reductions can bring substantial energy savings, especially in SRAM memory arrays occupying a large percentage of the on-die area

Read more

Summary

Introduction

For more than a decade, Graphics Processing Units (GPUs) have established themselves as a massively parallel computing device and have been adopted in multiple computing areas, from embedded systems to high-performance data centers. This success has resulted in a plethora of applications coded and optimized to run on GPU devices (GPGPU applications). Every entry consists of 64 components of 4 bytes each totalling 256 B To access to these register entries, threads or work-items are organized into groups of up to 64 threads called wavefronts. This way, each thread works with a different component of the same entry, referring to each component is avoided in the ISA

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call