Abstract
Convolutional neural networks (CNNs) have developed to become powerful models for various computer vision tasks ranging from object detection to semantic segmentation. However, most of the state-of-the-art CNNs cannot be deployed directly on edge devices such as smartphones and drones, which need low latency under limited power and memory bandwidth. One popular, straightforward approach to compressing CNNs is network slimming, which imposes $\ell _{1}$ regularization on the channel-associated scaling factors via the batch normalization layers during training. Network slimming thereby identifies insignificant channels that can be pruned for inference. In this paper, we propose replacing the $\ell _{1}$ penalty with an alternative nonconvex, sparsity-inducing penalty in order to yield a more compressed and/or accurate CNN architecture. We investigate $\ell _{p} (0 , transformed $\ell _{1}$ ( $\text{T}\ell _{1}$ ), minimax concave penalty (MCP), and smoothly clipped absolute deviation (SCAD) due to their recent successes and popularity in solving sparse optimization problems, such as compressed sensing and variable selection. We demonstrate the effectiveness of network slimming with nonconvex penalties on three neural network architectures – VGG-19, DenseNet-40, and ResNet-164 – on standard image classification datasets. Based on the numerical experiments, $\text{T}\ell _{1}$ preserves model accuracy against channel pruning, $\ell _{1/2, 3/4}$ yield better compressed models with similar accuracies after retraining as $\ell _{1}$ , and MCP and SCAD provide more accurate models after retraining with similar compression as $\ell _{1}$ . Network slimming with $\text{T}\ell _{1}$ regularization also outperforms the latest Bayesian modification of network slimming in compressing a CNN architecture in terms of memory storage while preserving its model accuracy after channel pruning.
Highlights
In the past years, convolutional neural networks (CNNs) have evolved into superior models for various computer vision tasks, such as image classification [1]–[3], image segmentation [4]–[6], and object detection [7]–[9]
We apply the proposed methods onto VGG-19, DenseNet-40, and ResNet-164 trained on CIFAR 10/100 and SVHN
We observe that p and transformed 1 (T 1) save more on parameters and floating point operations (FLOPs) than 1 with a slight decrease in test accuracy
Summary
Convolutional neural networks (CNNs) have evolved into superior models for various computer vision tasks, such as image classification [1]–[3], image segmentation [4]–[6], and object detection [7]–[9]. Training a highly accurate CNN is computationally demanding. Pruning [20]– [23] determines which weights, filters, and/or channels are unnecessary and removes them from the network. Another popular direction is to sparsify the CNN while training it [24]–[27]. Sparsity can be imposed on various types of structures existing in CNNs, such as filters and channels [27]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.