Abstract

Deep convolutional neural network compression has attracted lots of attention due to the need to deploy accurate models on resource-constrained edge devices. Existing techniques mostly focus on compressing networks for image-level classification, and it is not clear if they generalize well on network architectures for more challenging pixel-level tasks, e.g., dense crowd counting or semantic segmentation. In this paper, we propose an adaptive correlation-driven sparsity learning (ACSL) framework for channel pruning that outperforms state-of-the-art methods on both image-level and pixel-level tasks. In our ACSL framework, we first quantify the data-dependent channel correlation information with a channel affinity matrix. Next, we leverage these inter-dependencies to induce sparsity into the channels with the introduced adaptive penalty strength. After removing the redundant channels, we obtain compact and efficient models, which have significantly less number of parameters while maintaining comparable performance with the original models. We demonstrate the advantages of our proposed approach on three popular vision tasks, i.e., dense crowd counting, semantic segmentation, and image-level classification. The experimental results demonstrate the superiority of our framework. In particular, for crowd counting on the Mall dataset, the proposed ACSL framework is able to reduce up to 94% parameters (VGG16-Decoder) and 84% FLOPs (ResNet101), while maintaining the same performance of (at times outperforming) the original model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call