Zero and data reuse-aware fast convolution for deep neural networks on GPU

Hyunsun Park,Junwhan Ahn,Dongyoung Kim,Sungjoo Yoo

doi:10.1145/2968456.2968476

Abstract

Convolution operations dominate the total execution time of deep convolutional neural networks (CNNs). In this paper, we aim at enhancing the performance of the state-of-the-art convolution algorithm (called Winograd convolution) on the GPU. Our work is based on two observations: (1) CNNs often have abundant zero weights and (2) the performance benefit of Winograd convolution is limited mainly due to extra additions incurred during data transformation. In order to exploit abundant zero weights, we propose a low-overhead and efficient hardware mechanism that skips multiplications that will always give zero results regardless of input data (called ZeroSkip). In addition, to leverage the second observation, we present data reuse optimization for addition operations in Winograd convolution (called AddOpt), which improves the utilization of local registers, thereby reducing on-chip cache accesses. Our experiments with a real-world deep CNN, VGG-16, on GPGPU-Sim and Titan X show that the proposed methods, ZeroSkip and AddOpt, achieve 51.8% higher convolution performance than the baseline Winograd convolution. Moreover, even without any hardware modification, AddOpt alone gives 35.6% higher performance on a real hardware platform, Titan X.

Full Text