Abstract
The huge storage and computation cost of convolutional neural networks (CNN) make them challenging to meet the real-time inference requirement in many applications. Existing channel pruning methods mainly focus on removing unimportant channels in a CNN model based on rule-of-thumb designs, using reduced floating-point operations (FLOPs) and parameter numbers to measure the pruning quality. The inference latency of pruned models is often overlooked. In this paper, we propose a latency-aware automatic CNN channel pruning method (LACP), which aims to search low latency and accurate pruned network structure automatically. We evaluate the inaccuracy of measuring pruning quality by FLOPs and the number of parameters, and use the model inference latency as the direct optimization metric. To bridge model pruning and inference acceleration, we analyze the inference latency of convolutional layers on GPU. Results show that the inference latency of convolutional layers exhibits a staircase pattern along with channel number due to the GPU tail effect. Based on that observation, we greatly shrink the search space of network structures. Then we apply an evolutionary procedure to search a computationally efficient pruned network structure, which reduces the inference latency and maintains the model accuracy. Experiments and comparisons with state-of-the-art methods on three image classification datasets show that our method can achieve better inference acceleration with less accuracy loss.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: BenchCouncil Transactions on Benchmarks, Standards and Evaluations
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.