High-bandwidth memory (HBM) offers breakthrough memory bandwidth through its vertically stacked memory architecture and through-silicon via (TSV)-based fast interconnect. However, the stacked architecture leads to high-power density causing thermal issues when running modern memory-hungry workloads such as deep neural networks (DNNs). Prior works on dynamic thermal management (DTM) of 3-D DRAM do not consider the physical structure of HBM and often lead to heavy DTM-induced performance penalty. We propose an application-aware efficient task mapping and migration-based DTM policy that maps DNN instances to cores through exploiting the channel layout of HBM and leveraging the significant temperature gradient across DRAM dies while making thermal decisions. We utilize the variation in the memory access behavior of DNN layers and attempt to minimize stalling due to thermal hotspots in the HBM stack. We also use application-aware dynamic voltage and frequency scaling (DVFS) and DRAM low-power states to further improve performance. Experimental results on workloads comprising seven popular DNNs show that NeuroMap results in an average execution time and memory energy reduction of 39% and 40%, respectively, over state-of-the-art DTM mechanisms.
Read full abstract