Abstract
Channel pruning is a widely used approach that can efficiently reduce inference time and memory footprint by removing unnecessary channels in convolutional neural networks. In previous studies, channel pruning based on sparsity training was performed by imposing ℓ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> regularization on the scaling factor in batch normalization, and thereafter removing channels/filters below the predefined threshold. However, because channel pruning based on sparsity training imposes ℓ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> penalty on all scaling factors and uses the deformed gradient, an accuracy drop is inevitable. To address this problem, we propose a new sparsity training method referred to as adaptive gradient training (AGT). The proposed AGT can create a compact network without performance degradation using the original gradient to the extent possible without ℓ <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> penalty usage. The proposed AGT can reduce the FLOPs of MobileNetV1 by 71.7% on the CIFAR-10 dataset while achieving an accuracy improvement of 0.04%. Consequently, the proposed method outperformed existing channel pruning methods for all datasets and models.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.