Efficient hardware implementation of cellular neural networks with powers-of-two based incremental quantization

Xiaowei Xu,Tianchen Wang,Yiyu Shi,Yu Hu,Qing Lu,Jinglan Liu

doi:10.1145/3183584.3183611

Abstract

Cellular neural networks (CeNNs) have been widely adopted in image processing tasks. Recently, various hardware implementations of CeNNs have emerged in the literature, with Field Programmable Gate Array (FPGA) being one of the most popular choices due to its high flexibility and low time-to-market. However, existing FPGA implementations of CeNNs are typically bounded by the limited number of embedded multipliers available therein, while the vast number of Logic Elements (LEs) and registers are never utilized. Apparently, such unbalanced resource utilization leads to sub-optimal CeNN performance and speed. To address this issue, in this paper we propose an incremental quantization based approach for the FPGA implementation of CeNNs. It quantizes the numbers in CeNN templates to powers of two, so that complex and expensive multiplications can be converted to simple and cheap shift operations, which only require a minimum number of registers and LEs. While similar concept has been explored in hardware implementations of Convolutional Neural Networks (CNNs), CeNNs have completely different computation patterns which require different quantization and implementation strategies. Experimental results on FPGAs show that our approach can significantly improve the resource utilization, and as a direct consequence a speedup up to 7.8x can be achieved with no performance loss compared with the state-of-the-art implementations. We also discover that different from CNNs, the optimal quantization strategies of CeNNs depend heavily on the applications. We hope that our work can serve as a pioneer in the hardware optimization of CeNNs.

Full Text