Exploiting Data Entropy for Neural Network Compression

Tse-Wen Chen,Pangfeng Liu,Jan-Jan Wu

doi:10.1109/bigdata50022.2020.9378489

Abstract

Convolutional neural networks (CNN) achieves tremendous success in computer vision. However, due to the increasing number of parameters and the limitation of hardware/software resources, model compression has become an important issue, so we should reduce the size of CNN’s and improve the train and inference speed. This paper focuses on channel pruning, a model compression technique that evaluates the importance of channels within a convolution layer, and prune away the less important ones.In this paper, we propose a mutual information metric to prune the network. By measuring the entropy of feature maps, we can estimate how much information goes through each channel during the label recognition and prune away those that have the least information. We compute the mutual information between feature maps and labels, which is the only information relevant to the label classification.We also propose a weighted mutual information metric that further improves the accuracy. We observe from our experiments that the weighted mutual information metric achieves better accuracy than the classic L1-norm metric [1] and the original entropy metric [2]. We also discover that the classic L1-norm pruning metric can be improved by computing the L1-norm of output filter weights (denoted as output L1) instead of input filter weights (denoted as input L1).We test our channel pruning algorithms on the SVHN, the CIFAR-10, and the CIFAR-100 datasets using Simplenet [3]. When we prune away 70% parameters for all convolution layers, our weighted mutual information method has 1.52%, 13.24%, and 7.90% higher accuracy than the output L1 metric on these three datasets. In the global pruning experiment, our weighted mutual in-formation metric has about 2% higher accuracy than the output L1 metric when we removed 55% of parameters from the SVHN dataset. On the CIFAR-100 dataset, our metric is 1.5% more accurate than the output L1 metric when there are only 53% of the parameters remain. The only exception is the CIFAR-10 dataset, where our metric is 5% less accurate than the output L1 metric when there are 40% of the parameters remain.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting Data Entropy for Neural Network Compression

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Mattes’ Mutual Information Metric for Multimodality Registration of DESS and T2 Mapped Knee Articular MR Sequences
Kenneth Urish
The Insight Journal | VOL. -
Kenneth UrishKenneth Urish
09 May 2012
The Insight Journal | VOL. -

Adaptive Radar Waveform Design Based on Weighted MI and the Difference of Two Mutual Information Metrics
Fengming Xin ... Xin Song
Complexity | VOL. 2021
Fengming Xin, et. al.Fengming Xin ... Xin Song
28 Jan 2021
Complexity | VOL. 2021

Outage probability analysis for superposition coded symmetric relaying
Yi Wu ... Meng Zheng
Science China Information Sciences | VOL. 56
Yi Wu, et. al.Yi Wu ... Meng Zheng
01 Feb 2013
Science China Information Sciences | VOL. 56

TU‐CD‐BRA‐02: Comparing Mutual Information and Gradient Magnitude Metrics for Deformable Image Registration
I Gertsenshteyn ... G Sharp
Medical Physics | VOL. 42
I Gertsenshteyn, et. al.I Gertsenshteyn ... G Sharp
01 Jun 2015
Medical Physics | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Data Entropy for Neural Network Compression

Abstract

Talk to us

Similar Papers