AGT: Channel Pruning Using Adaptive Gradient Training for Accelerating Convolutional Neural Networks

Nam Joon Kim,Hyun Kim

doi:10.1109/iceic57457.2023.10049943

Abstract

Channel pruning is a widely used approach that can efficiently reduce inference time and memory footprint by removing unnecessary channels in convolutional neural networks. In previous studies, channel pruning based on sparsity training was performed by imposing ℓ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> regularization on the scaling factor in batch normalization, and thereafter removing channels/filters below the predefined threshold. However, because channel pruning based on sparsity training imposes ℓ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> penalty on all scaling factors and uses the deformed gradient, an accuracy drop is inevitable. To address this problem, we propose a new sparsity training method referred to as adaptive gradient training (AGT). The proposed AGT can create a compact network without performance degradation using the original gradient to the extent possible without ℓ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> penalty usage. The proposed AGT can reduce the FLOPs of MobileNetV1 by 71.7% on the CIFAR-10 dataset while achieving an accuracy improvement of 0.04%. Consequently, the proposed method outperformed existing channel pruning methods for all datasets and models.

Full Text