Abstract

Modern graphics processing units (GPUs) exhibit increasing demands for register files (RFs) with larger capacity and bank sizes, which jeopardize the traditional SRAM-based RF designs due to their large die area and long access latency. Recent hybrid RF designs, e.g., SRAM and spin-transfer torque random access memory (STT-RAM)-based RFs, mitigate the issue by exploiting the density and performance advantages in STT-RAM and SRAM, respectively. However, existing hybrid RF designs adopt coarse integration that has limited write bandwidth between SRAM and STT-RAM, which restricts the adoption of different warp schedulers at runtime. In this article, we propose FRF, a warp-scheduler friendly fine-grained hybrid RF design using SRAM/STT-RAM hybrid cell (HC) structures. By integrating one SRAM cell and $N$ STT-RAM cells as one HC, FRF exploits internal write paths to enlarge the access bandwidth between SRAM and STT-RAM and thus greatly optimizes the area and performance. FRF enables the concurrent context-switching such that different warp schedulers may be adopted at runtime. FRF adopts interleaved register mapping (IRM) and on-demand register remapping to further improve the utilization of SRAM in each HC. Our experimental results show that, on average, FRF achieves 50% performance improvement and 40% energy consumption reduction over the coarse-grained hybrid design when adopting loose round-robin (LRR), and achieves 159% efficiency improvement over pure STT-RAM-based RF.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call