Abstract

Unstructured deep neural network (DNN) pruning have been widely studied. However, previous schemes only focused upon compressing the model’s memory footprint, which had led to relatively low reduction ratio in computational workload. This study demonstrates that the main reason behind is the inconsistent distribution of memory footprint and workload of the DNN model among different layers. Based on this observation, we propose to map the network pruning flow as a multi-objective optimization problem and design an improved genetic algorithm, which can efficiently explore the whole pruning structure space with both pruning goals equally constrained, to find the suitable solution that strikes a judicious balance between the DNN’s model size and workload. Experiments show that the proposed scheme can achieve up to 34% further reduction on the model’s computational workload compared to the state-of-the-art pruning scheme [11,33] for ResNet50 on the ILSVRC-2012 dataset. We have also deployed the pruned ResNet50 models on a dedicated DNN accelerator, and the measured data have shown a considerable 6× reduction in inference time compared to FPGA accelerator implementing dense CNN model quantized in INT8 format, and a 2.27× improvement in power efficiency over 2080Ti GPU-based implementations, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.