Abstract

Modern Graphics Processing Units (GPUs) widely adopt large SRAM based register file (RF) to enable fast context-switch. A large SRAM RF may consume 20% to 40% GPU power, which has become one of the major design challenges for GPUs. Recent studies mitigate the issue through hybrid RF designs that architect a large STT-RAM (Spin Transfer Torque Magnetic memory) RF and a small SRAM buffer. However, the long STT-RAM write latency throttles the data exchange between STT-RAM and SRAM, which deprecates warp scheduler with frequent context switches, e.g., round robin scheduler. In this paper, we propose HC-RF, a warp-scheduler friendly hybrid RF design using novel SRAM/STT-RAM hybrid cell (HC) structure. HC-RF exploits cell level integration to improve the effective bandwidth between STT-RAM and SRAM. By enabling silent data transfer from SRAM to STT-RAM without blocking RF banks, HC-RF supports concurrent context-switching and decouples its dependency on warp scheduler. Our experimental results show that, on average, HC-RF achieves 50% performance improvement and 44% energy consumption reduction over the coarse-grained hybrid design when adopting LRR(Loose Round Robin) warp scheduler.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.