Abstract

Large deep neural network (DNN) models are computation and memory intensive, which limits their deployment especially on edge devices. Therefore, pruning, quantization, data sparsity and data reuse have been applied to DNNs to reduce memory and computation complexity at the expense of some accuracy loss. The reduction in the bit-precision results in loss of information, and the aggressive bit-width reduction could result in noticeable accuracy loss. This paper introduces Scaling-Weight-based Convolution (SWC) technique to reduce the DNN model size and the complexity and number of arithmetic operations. This is achieved by, using a small set of high-precision weights (maximum absolute weight “MAW”) and a large set of low-precision weights (Scaling weights “SWs”). This results in decreasing the model size with minimum loss in accuracy compared to simply reducing the precision. Moreover, a scaling and quantized network-acceleration processor (SQNAP) is proposed based on the SWC method to achieve high-speed and low-power with reduced memory accesses. The proposed SWC eliminate >90% of the multiplications in the network. Moreover, the less important SWs are pruned, which has a small portion of the MAW. Retraining is applied in order to maintain accuracy. Full analysis for MNIST, Fashion MNIST, Cifar 10 and Cifar 100 datasets is presented for image recognition, where different DNN models are used including LeNet, ResNet, AlexNet and VGG 16.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call