Abstract

Convolution neural network (CNN) accelerators are commonly used to boost the CNN application's performance. The energy efficiency of the CNN accelerators is of paramount importance for battery-operated devices like smartphones. A substantial fraction of their energy consumption is due to off-chip memory accesses. These accelerators connect to the off-chip memory by a wide bus to improve the throughput. However, accessing the data from an unaligned address or size that is not a multiple of bus width leads to low bus width utilization and wastage of energy. Memory accesses can be reduced considerably by partitioning the data in a way that increases the number of aligned accesses and optimally utilizes bus width. We propose an approach that factors in the architectural parameters to evaluate the memory access. Our tool determines optimal partitioning and data reuse scheme for convolution and fully connected layers to minimize the off-chip memory accesses for these accelerators. Compared to state-of-the-art, our approach reduces off-chip memory accesses of AlexNet, VGG16, and ResNet:50 by 9%, 16%, and 28% on 64 bits data bus and by 16%, 29%, and 46% on 128 bits data bus, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.