Abstract

As a practical approach for compressing convolutional neural networks (CNNs), network pruning has been rapidly developed in recent years. The conventional methods prune inactive filters permanently from models to reduce the width of each layer and then train the pruned model until convergence. However, such methods have limitations in that: (1) The activation-based pruning criteria ignore the correlation between filters, leading to attenuation in types of features; (2) The permanent filter removal restricts the architecture of models in the subsequent training so that reducing the chances of learning more features; (3) The single-width compression may generate narrow layers that block the information flow, resulting in limited feature capacity in the next layers and hard optimization. These limitations reduce the feature diversity in the pruned model and thus lead to sub-optimal model quality. In this paper, a compression method named filter clustering is proposed to rectify the problem of poor feature diversity in traditional pruning and achieve better model quality from three perspectives. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Firstly</i> , to maintain the variety of features after pruning, we treat the model compression as a clustering task and merge filters with similar outputs, rather than removing inactive filters. Specifically, a handy estimation approach is designed to convert the similarity of the output into filter similarity, which liberates the measurement from sampling numerous images. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Secondly</i> , to increase the probability of learning more features during training, we propose a periodic training and clustering pipeline, which creates a larger optimization space by dynamically exploring different sub-model architectures. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Finally</i> , to prevent the feature capacity from being influenced by the narrow layers, we introduce and leverage a fusible anti-blocking branch to smoothly remove such layers. Extensive experiments demonstrate that the proposed method can achieve compact models with better feature diversity and reduce 1% ~ 15% more calculations than the previous methods while maintaining performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call