A challenge of channel pruning is designing efficient and effective criteria to select channels to prune. A widely used criterion is minimal performance degeneration, e.g., loss changes before and after pruning being the smallest. To accurately evaluate the truth performance degeneration requires retraining the survived weights to convergence, which is prohibitively slow. Hence existing pruning methods settle to use previous weights (without retraining) to evaluate the performance degeneration. However, we observe that the loss changes differ significantly with and without retraining. It motivates us to develop a technique to evaluate true loss changes without retraining, using which to select channels to prune with more reliability and confidence. We first derive a closed-form estimator of the true loss change per mask change, using influence functions without retraining. Influence function is a classic technique from robust statistics that reveals the impacts of a training sample on the model's prediction and is repurposed by us to assess impacts on true loss changes. We then show how to assess the importance of all channels simultaneously and develop a novel global channel pruning algorithm accordingly. We conduct extensive experiments to verify the effectiveness of the proposed algorithm, which significantly outperforms the competing channel pruning methods on both image classification and object detection tasks. To the best of our knowledge, we are the first that shows evaluating true loss changes or pruning without retraining is possible. This finding will open up opportunities for a series of new paradigms to emerge that differ from existing pruning methods.
Read full abstract