Architecting the Last-Level Cache for GPUs using STT-RAM Technology

Mohammad Hossein Samavatian,Ramin Bashizade,Mohammad Arjomand,Hamid Sarbazi-Azad

doi:10.1145/2764905

Abstract

Future GPUs should have larger L2 caches based on the current trends in VLSI technology and GPU architectures toward increase of processing core count. Larger L2 caches inevitably have proportionally larger power consumption. In this article, having investigated the behavior of GPGPU applications, we present an efficient L2 cache architecture for GPUs based on STT-RAM technology. Due to its high-density and low-power characteristics, STT-RAM technology can be utilized in GPUs where numerous cores leave a limited area for on-chip memory banks. They have, however, two important issues, high energy and latency of write operations, that have to be addressed. Low retention time STT-RAMs can reduce the energy and delay of write operations. Nevertheless, employing STT-RAMs with low retention time in GPUs requires a thorough study on the behavior of GPGPU applications. Based on this investigation, we have architectured a two-part STT-RAM-based L2 cache with low-retention (LR) and high-retention (HR) parts. The proposed two-part L2 cache exploits a dynamic threshold regulator (DTR) to efficiently regulate the write threshold for migration of the data blocks from HR to LR, based on the behavior of the applications. Also, a Data and Access type Aware Cache Search mechanism (DAACS) is hired for handling the search of the requested data blocks in two parts of the cache. The STT-RAM L2 cache architecture proposed in this article can improve IPC by up to 171% (20% on average), and reduce the average consumed power by 28.9% compared to a conventional L2 cache architecture with equal on-chip area.

Full Text