Convolution Acceleration: Query Based Filter Pruning

Arthur Feeney,Yu Zhang

doi:10.1109/smc.2019.8914081

Abstract

The rising ubiquity of Convolutional Neural Networks (CNN) for learning tasks has led to their use on a variety of devices. CNNs can be used on small devices, such as phones or embedded systems; however, compute time is a critical enabling factor. On these devices, trading high accuracy for improved performance may be worthwhile. This has led to active research in high-level convolution optimizations. One successful class of optimizations is filter pruning, in which filters that are determined to have a small effect on the network’s final output are deleted. In this work, we present a self-pruning convolution that is intended to accelerate convolutions for use on small devices. We call it an ALSH Convolution because it uses Asymmetric Locality Sensitive Hashing to generate a subset of the convolution’s filters that are likely to produce large outputs for a given input. Our methodology is accessible: it generalizes well to many architectures and is easy to use, essentially functioning as a regular layer. Experiments show that a network modified to use ALSH Convolutions can stay within 5% accuracy on CIFAR-10 and 10% on CIFAR-100. Further, on small devices, a network built with our implementation can be 2x faster than the same network composed of PyTorch’s convolution.

Full Text