In this paper, we propose a novel Convolutional Neural Network hardware accelerator called CoNNA, capable of accelerating pruned, quantized CNNs. In contrast to most existing solutions, CoNNA offers a complete solution to the compressed CNN acceleration, being able to accelerate all layer types commonly found in contemporary CNNs. CoNNA is designed as a coarse-grained reconfigurable architecture, which uses rapid, dynamic reconfiguration during CNN layer processing. The CoNNA architecture enables the on-the-fly selection of the CNN network that should be accelerated and also supports the acceleration of CNN networks with dynamic topology. Furthermore, by being able to directly process compressed feature and kernel maps, and skip all ineffectual computations during CNN layer processing, the CoNNA CNN accelerator is able to achieve higher CNN processing rates than some of the previously proposed solutions. The CoNNA architecture has been implemented using Xilinx ZynqUtrascale+ FPGA family and compared with seven previously proposed CNN hardware accelerators. Results of the experiments seem to indicate that the CoNNA architecture is up to 14.10, 6.05, 4.91, 2.67, 11.30, 3.08 and 3.58 times faster than previously proposed MIT's Eyeriss, NullHop, NVIDIA's Deep Learning Accelerator (NVDLA), NEURAghe, CNN_A1, fpgaConvNet, and Deephi's Aristotle CNN accelerators respectively, while using identical number of computing units and operating at the same clock frequency.
Read full abstract