Minimizing Off-Chip Memory Access for CNN Accelerators

Saurabh Tewari,Kolin Paul,Anshul Kumar

doi:10.1109/mce.2021.3097697

Abstract

Convolution neural network (CNN) accelerators are commonly used to boost the CNN application's performance. The energy efficiency of the CNN accelerators is of paramount importance for battery-operated devices like smartphones. A substantial fraction of their energy consumption is due to off-chip memory accesses. These accelerators connect to the off-chip memory by a wide bus to improve the throughput. However, accessing the data from an unaligned address or size that is not a multiple of bus width leads to low bus width utilization and wastage of energy. Memory accesses can be reduced considerably by partitioning the data in a way that increases the number of aligned accesses and optimally utilizes bus width. We propose an approach that factors in the architectural parameters to evaluate the memory access. Our tool determines optimal partitioning and data reuse scheme for convolution and fully connected layers to minimize the off-chip memory accesses for these accelerators. Compared to state-of-the-art, our approach reduces off-chip memory accesses of AlexNet, VGG16, and ResNet:50 by 9%, 16%, and 28% on 64 bits data bus and by 16%, 29%, and 46% on 128 bits data bus, respectively.

Full Text