Abstract

Channel pruning is a widely used approach that can efficiently reduce inference time and memory footprint by removing unnecessary channels in convolutional neural networks. In previous studies, channel pruning based on sparsity training was performed by imposing ℓ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> regularization on the scaling factor in batch normalization, and thereafter removing channels/filters below the predefined threshold. However, because channel pruning based on sparsity training imposes ℓ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> penalty on all scaling factors and uses the deformed gradient, an accuracy drop is inevitable. To address this problem, we propose a new sparsity training method referred to as adaptive gradient training (AGT). The proposed AGT can create a compact network without performance degradation using the original gradient to the extent possible without ℓ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> penalty usage. The proposed AGT can reduce the FLOPs of MobileNetV1 by 71.7% on the CIFAR-10 dataset while achieving an accuracy improvement of 0.04%. Consequently, the proposed method outperformed existing channel pruning methods for all datasets and models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call