Deep neural networks (DNNs) have been widely adopted, owing to break-through performance and high accuracy. DNNs exhibit varying memory behavior involving specific and recognizable memory access patterns and access intensity, depending on the selected data reuse in different layers. Such applications have high memory bandwidth demands due to aggressive computations, performing several billion-floating-point-operations-per-second (BFLOPs). 3D DRAMs, providing very high memory access bandwidth, are extensively employed to break the memory wall , bridging the gap between compute and memory while running DNNs. However, the vertical integration in 3D DRAM introduces serious thermal issues, resulting from high power density and close proximity of memory cells, and requires dynamic thermal management (DTM). To unleash the true potential of 3D DRAM and exploit the enormous bandwidth under thermal constraints, there is a need to intelligently map the DNN application’s data across memory channels, pseudo-channels, and banks, minimizing the effective memory latency and reducing the thermal-induced application slowdown. The specific memory access patterns exhibited by a DNN layer execution are crucial to determine a favourable data mapping method for 3D DRAM dies that potentially causes minimal thermal impact, and also maximize DRAM bandwidth utilization. In this work, we propose an application-aware and thermal-sensitive data mapping that intelligently assigns portions of the 3D DRAM to DNN layers, leveraging the knowledge about layer’s memory access patterns and minimizing DTM-induced performance overheads. Additionally, we also deploy a DRAM low-power states based DTM mechanism to keep the 3D DRAM within safe thermal limits. Using our proposal, we observe a performance improvement of 1% to 61%, and memory energy savings of 1% to 55% for popular deep neural networks over state-of-the-art DTM strategies while running DNN inference.
Read full abstract