Abstract

Deep Neural Networks (DNNs) are computationally and memory intensive, which present a big challenge for hardware, especially for resource-constrained devices such as Internet-of-Things (IoT) nodes. This paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques for image recognition applications. First, starting from the pre-trained network, then the DNNs weights are approximated based on the linear and quadratic approximation methods during the retraining phase to reduce the DNN model size and number of arithmetic operations. Then, the DNNs weights are replaced with the linear/quadratic coefficients to execute the inference so that different DNNs weights can be computed using the same coefficients. That leads to a repetition of the weights, which enables the reuse of the DNN sub-computations (computational reuse) and leverages the same data (data reuse) to reduce DNNs computations memory accesses, and improve energy efficiency, albeit at the cost of increased training time. Complete analysis for MNIST, Fashion MNIST, CIFAR 10, CIFAR 100, and tiny ImageNet datasets is presented for image recognition, where different DNN models are used, including LeNet, ResNet, AlexNet, and VGG16. Our results show that the linear approximation achieves <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1211.3\times $ </tex-math></inline-formula> , <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$21.8\times $ </tex-math></inline-formula> , <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$700\times $ </tex-math></inline-formula> , and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$19.3\times $ </tex-math></inline-formula> on LeNet-5 MNIST, LeNet Fashion MNIST, VGG16 and ResNet-20. respectively, with small accuracy loss. Compared to the state-of-the-art Row Stationary (RS) method, the proposed architecture saved 54% of the total number of adders and multipliers needed. Overall, the proposed approach is suitable for IoT edge devices as it reduces computing complexity, memory size, and memory access with a small impact on accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call