Abstract

Register files are a key data storage unit that impacts instruction throughput for graphics processing units (GPUs). Typically, GPU register files are quite large to accommodate many concurrent threads and are implemented using the same SRAM technology as the on-chip cache. We propose contextrf, a new register file architecture that efficiently leverages register files with nonuniform access characteristics, including hybrid SRAM/DRAM (S/D) and spintronic domain-wall memories (DWMs). Contextrf allows greater-capacity register files to be implemented in the same area within the GPU, with reduced power consumption. We also propose contextPreRF, a hardware preswitch scheme to hide switching delays—as soon as a register request is queued, the nonuniform access memories containing the corresponding register are sent a preemptive switch request. Thus, our scheme transparently hides the penalties of switching between register contexts. After replacing the register file SRAM with S/D, we can reduce energy by 37%, with a 1.4% average performance drop. Employing DWM, we reduce register file energy by 74%, with a 0.4% average performance penalty. For the denser DWM, we model converting the saved area into additional registers, cache, and shared memory—this improves performance by 13.5% over the baseline SRAM register file.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.