Expanding the Edge: Enabling Efficient Winograd CNN Inference With Deep Reuse on Edge Device

Feng Zhang,Zhen Zheng,Xiaoyong Du,Xiao Zhang,Jiawei Guan,Ruofan Wu,Xipeng Shen,Xiaoguang Guo

doi:10.1109/tkde.2023.3269017

Feng Zhang, Zhen Zheng + Show 6 more

https://doi.org/10.1109/tkde.2023.3269017

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Deep learning on edge devices is becoming increasingly important, especially with the explosion of IoT devices. For example, the total number of devices connected to IoT reaches 29 billion in 2022. Convolutional neural networks (CNNs), as common deep learning representatives, are among the most popular neural networks in knowledge and data engineering. However, CNN employs a high degree of computing. In comparison to the training phase, the inference process is more frequently done on low-power computing equipments, such as edge devices. The limited computing resource and high computation pressure limit the effective use of CNN algorithms at the edge. Fortunately, a minimal filtering algorithm called Winograd can reduce convolution calculations by minimizing multiplication operations. We find that Winograd convolution can be accelerated further by <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">deep reuse</i> technique, which reuses the similar data and computation processes. In this paper, we propose a new inference method, called DREW, which combines deep reuse with Winograd for further accelerating CNNs. DREW handles three difficulties. First, it can detect the similarities from the complex minimal filtering patterns by clustering. Second, it reduces the online clustering cost in a reasonable range. Third, it provides an adjustable method in clustering granularity balancing the performance and accuracy. We perform evaluation on Raspberry PI and NVIDIA Jetson AGX Xavier edge devices, and experiments show that on five popular networks, 1) DREW further accelerates the Winograd convolution by an average of 8.27× speedup. Even for the highly parallel Winograd implementation, DREW still can provide 2.21× speedup. 2) When DREW is applied to end-to-end Winograd CNN inferences, DREW achieves 5.94× the average performance speedup with no ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$< $</tex-math></inline-formula> 0.4%) accuracy loss. 3) Energy consumption is an important factor for inference in practice. DREW reduces the number of convolution operations to 10% of the original operations, thus achieving up to 60% energy-efficiency benefits than the original Winograd inference.

Full Text