SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators

Jiajun Li,Guihai Yan,Xiaowei Li,Wenyan Lu,Shuhao Jiang,Shijun Gong,Jingya Wu

doi:10.23919/date.2018.8342033

Abstract

Convolutional Neural Network (CNN) accelerators are rapidly growing in popularity as a promising solution for deep learning based applications. Though optimizations on computation have been intensively studied, the energy efficiency of such accelerators remains limited by off-chip memory accesses since their energy cost is magnitudes higher than other operations. Minimizing off-chip memory access volume, therefore, is the key to further improving energy efficiency. However, we observed that sticking to minimizing the accesses of one data type as many prior work did cannot fit the varying shapes of convolutional layers in CNNs. Hence, there exists a dilemma of minimizing the accesses of which data type. To overcome the problem, this paper proposed an adaptive layer partitioning and scheduling scheme, called SmartShuttle, to minimize off-chip memory accesses for CNN accelerators. Smartshuttle can adaptively switch among different data reuse schemes and the corresponding tiling factor settings to dynamically match different convolutional layers. Moreover, SmartShuttle thoroughly investigates the impact of data reusability and sparsity on the memory access volume. The experimental results show that SmartShuttle processes the convolutional layers at 434.8 multiply and accumulations (MACs)/DRAM access for VGG16 (batch size = 3), and 526.3 MACs/DRAM access for AlexNet (batch size = 4), which outperforms the state-of-the-art approach (Eyeriss) by 52.2% and 52.6%, respectively.

Full Text