Deep Neural Network Acceleration With Sparse Prediction Layers

Zhongtian Yao,Zhaoyan Ming,Kejie Huang,Haibin Shen

doi:10.1109/access.2020.2963941

Zhongtian Yao, Zhaoyan Ming + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.2963941

Copy DOI

Abstract

The ever-increasing computation cost of Convolutional Neural Network (CNN) makes it imperative for real-world applications to accelerate the key steps especially the inference. In this work, we propose an efficient yet general scheme called Sparse Prediction Layer (SPL) which can predict and skip the trivial elements in the CNN layer. Pruned weights are used to predict the locations of maximum values in max-pooling kernels and those of positive values before Rectified Linear Units (ReLUs). Thereafter, the precise values of these predicted important elements are calculated selectively and the complete outputs are restored from them. Our experiments on ImageNet Large Scale Visual Recognition Competition (ILSVRC) 2012 show that SPL can reduce 68.3%, 58.6% and 59.5% Floating-point Operations (FLOPs) on AlexNet, VGG-16 and ResNet-50, respectively, within an accuracy loss of less than 1% without retraining. The proposed SPL scheme can further accelerate these networks pruned by other pruning-based methods, such as a FLOP reduction of 50.2% on the ResNet-50 which has been pruned by Channel Pruning (CP) before being applied with SPLs. A special matrix multiplication called Sparse Result Matrix Multiplication (SRMM) is proposed to support the implementation of SPL, and its acceleration effect is in line with expectations.

Highlights

Convolutional neural network (CNN) [1] has significantly advanced the Artificial Intelligence (AI) capability in various fields in the past few years, including computer vision, robotics, security, biomedicine, healthcare, etc
(3) We show that Sparse Prediction Layer (SPL) lowers the computation cost of original networks significantly and further reduces the Floating-point Operations (FLOPs) of the networks pruned by other acceleration methods
APPLICATION ON ENTIRE MODELS To make a trade-off between the accuracy and the FLOP reduction ratio, we propose four schemes with different orders to apply Algorithm 1: layer order scheme (LOS), reversed layer order scheme (RLOS), FLOPs descending order scheme (FDOS) and susceptibility ascending order scheme (SAOS)

Summary

INTRODUCTION

Convolutional neural network (CNN) [1] has significantly advanced the Artificial Intelligence (AI) capability in various fields in the past few years, including computer vision, robotics, security, biomedicine, healthcare, etc. The networks are getting deeper and bigger to achieve better performance, at the cost of a much larger parameter size with numerous multiplication-accumulation operations The emerging applications such as autonomous vehicles and anomaly detection are asking for real-time processing. The calculation of those non-maximum values in maxpooling kernels is redundant Based on this idea, we propose an efficient and general network acceleration scheme, Sparse Prediction Layer (SPL). Yao et al.: Deep Neural Network Acceleration With SPLs achieves acceleration in a different way from pruning It can avoid some disadvantages of pruning-based methods and can be used as a complement to them. (3) We show that SPL lowers the computation cost of original networks significantly and further reduces the FLOPs of the networks pruned by other acceleration methods.

RELATED WORK

HYPOTHESIS

METHOD

Findings

CONCLUSION