Dynamic Runtime Feature Map Pruning

Pei Zhang,Shaobo Shi,John Glossner,Lei Wang,Tailin Liang,Xiaotong Zhang

doi:10.1007/978-3-030-88013-2_34

Abstract

High bandwidth requirements in edge devices can be a bottleneck for many systems - especially for accelerating both training and inference of deep neural networks. In this paper, we analyze feature map sparsity for several popular convolutional neural networks. When considering run-time behavior, we find a good probability of dynamically disabling many feature maps. By evaluating the number of 0-valued activations within feature maps, we find many feature maps that can be dynamically pruned. This is particularly effective when a ReLU activation function is used. To take advantage of inactive feature maps, we present a novel method to dynamically prune feature maps at runtime reducing bandwidth by up to 11.5% without loss of accuracy for image classification. We further apply this method on Non-ReLU activation functions by allowing the output of the activation function to be within an epsilon of 0. Additionally, we also studied how video streaming applications could benefit from bandwidth reduction.

Full Text