Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs

Guangli Li,Yida Wang,Zhen Jia,Xueying Wang,Xiaobing Feng

doi:10.1145/3632956

Abstract

Low-precision computation has emerged as one of the most effective techniques for accelerating convolutional neural networks and has garnered widespread support on modern hardware. Despite its effectiveness in accelerating convolutional neural networks, low-precision computation has not been commonly applied to fast convolutions, such as the Winograd algorithm, due to numerical issues. In this article, we propose an effective quantized Winograd convolution, named LoWino, which employs an in-side quantization method in the Winograd domain to reduce the precision loss caused by transformations. Meanwhile, we present an efficient implementation that integrates well-designed optimization techniques, allowing us to fully exploit the capabilities of low-precision computation on modern CPUs. We evaluate LoWino on two Intel Xeon Scalable Processor platforms with representative convolutional layers and neural network models. The experimental results demonstrate that our approach can achieve an average of 1.84× and 1.91× operator speedups over state-of-the-art implementations in the vendor library while preserving accuracy loss at a reasonable level.

Full Text