Abstract

Binary convolutional neural networks (BCNN) have shown good accuracy for small to medium neural network models. Their extreme quantization of weights and activations reduces off-chip data transfer and greatly reduces the computational complexity of convolutions. Further reduction in the complexity of a BCNN model for fast execution can be achieved with model size reduction at the cost of network accuracy. In this paper, a multi-model inference technique is proposed to reduce the execution time of the binarized inference process without accuracy reduction. The technique considers a cascade of neural network models with different computation/accuracy ratios. A parameterizable binarized neural network with different trade-offs between complexity and accuracy is used to obtain multiple network models. We also propose a hardware accelerator to run multi-model inference throughput in embedded systems. The multi-model inference accelerator is demonstrated on low-density Zynq-7010 and Zynq-7020 FPGA devices, classifying images from the CIFAR-10 dataset. The proposed accelerator improves the frame rate per number of LUTs by 7.2× those of previous solutions on a ZYNQ7020 FPGA with similar accuracy. This shows the effectiveness of the multi-model inference technique and the efficiency of the proposed hardware accelerator.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call