Abstract

As technology keeps scaling down at nano-scale, the increasing process variations (PV) induce significant delay variations and limit the maximum clock frequency in GPGPUs (general-purpose computing on graphics processing units). Each computing core (i.e. streaming multiprocessor) in GPGPUs supports thousands of simultaneously active threads, and requires a large register file. Such a sizeable register file is very sensitive to process variations, and becomes one of the major units in determining the core frequency. In this study, we first develop a novel mechanism that classifies registers into fast and slow categories in the highly-banked register architecture to maximize the frequency improvement. We then leverage the unique features in GPGPU applications to effectively tolerate the extra access delay to the slow registers. Our experimental results show that our proposed techniques are able to significantly optimize GPGPUs performance under process variations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call