Abstract

Deep neural network compression is important and increasingly developed especially in resource-constrained environments, such as autonomous drones and wearable devices. Basically, we can easily and largely reduce the number of weights of a trained deep model by adopting a widely used model compression technique, e.g., pruning. In this way, two kinds of data are usually preserved for this compressed model, i.e., non-zero weights and meta-data, where meta-data is employed to help encode and decode these non-zero weights. Although we can obtain an ideally small number of non-zero weights through pruning, existing sparse matrix coding methods still need a much larger amount of meta-data (may several times larger than non-zero weights), which will be a severe bottleneck of the deploying of very deep models. To tackle this issue, we propose a layerwise sparse coding (LSC) method to maximize the compression ratio by extremely reducing the amount of meta-data. We first divide a sparse matrix into multiple small blocks and remove zero blocks, and then propose a novel signed relative index (SRI) algorithm to encode the remaining non-zero blocks (with much less meta-data). In addition, the proposed LSC performs parallel matrix multiplication without full decoding, while traditional methods cannot. Through extensive experiments, we demonstrate that LSC achieves substantial gains in pruned DNN compression (e.g., 51.03x compression ratio on ADMM-Lenet) and inference computation (i.e., time reduction and extremely less memory bandwidth), over state-of-the-art baselines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.