Abstract

This work is focused on the UnSparse-Opt framework for the efficient unstructured pruning and quantisation of feedforward neural networks and the improvement of their efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the most effective implementation of deep learning (DL) algorithms for GPUs. One of the most common techniques for improving the efficiency of Convolutional Neural Network (CNN) models is weight pruning and quantisation. There are two main types of pruning: structural and non-structural. The first enables much easier acceleration on many type of accelerators, but with structural it is difficult to achieve a sparsity level and accuracy as high as that obtained with the non-structural version. Non-structural pruning with retraining can generate the weight tensors up to ∼90% or more of sparsity in some deep CNN models. In this article, the pruning algorithm is presented which achieve high sparsity levels without drop in accuracy. In the next stage, the linear and non-linear quantisation is adapted for further reductions in time and memory footprint. Additionally, this work presents real CNN models pruned with high sparsities in which some subset of layers can have comparable or better efficiency than cuDnn by using a direct sparse method. Finally, it shows sparse CNN-based architectures with reduced precision which can be more efficient than CuDnn library.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.