Bandwidth Efficient Architectures for Convolutional Neural Network

Jichen Wang,Zhongfeng Wang,Jun Lin

doi:10.1109/sips.2018.8598291

Abstract

In recent years, Convolutional Neural Network (CNN) has been rapidly evolving and the real-time CNN implementations in embedded systems are becoming highly demanding. It is necessary that high performance and real time CNN based implementations be realized in local processors. Conventional approaches designing CNN accelerators focus on reducing the computational workload of CNNs. However, the limited external memory bandwidth has become the main bottleneck of CNN acceleration in embedded systems. Because in deep and large CNN models the feature map pixels and weights, which are numerous and must be stored in external memories, need to be exchanged between off-chip and on-chip memories frequently. Hence the performance is constrained by the limited external memory bandwidth. In this paper, bandwidth efficient architectures for CNN implementation are proposed. The intermediate pixel data are stored on chip and kernel weights are transferred in an efficient way. Compared to mainstream CNN implementation methods, the proposed architectures can efficiently utilize external memory bandwidth while preserving the original throughput.

Full Text