Abstract

Deep convolutional neural networks have a large number of parameters and require a significant number of floating-point operations during computation, which limits their deployment in situations where the storage space is limited and computational resources are insufficient, such as in mobile phones and small robots. Many network compression methods have been proposed to address the aforementioned issues, including pruning, low-rank decomposition, quantization, etc. However, these methods typically fail to achieve a significant compression ratio in terms of the parameter count. Even when high compression rates are achieved, the network’s performance is often significantly deteriorated, making it difficult to perform tasks effectively. In this study, we propose a more compact representation for neural networks, named Quantized Low-Rank Tensor Decomposition (QLTD), to super compress deep convolutional neural networks. Firstly, we employed low-rank Tucker decomposition to compress the pre-trained weights. Subsequently, to further exploit redundancies within the core tensor and factor matrices obtained through Tucker decomposition, we employed vector quantization to partition and cluster the weights. Simultaneously, we introduced a self-attention module for each core tensor and factor matrix to enhance the training responsiveness in critical regions. The object identification results in the CIFAR10 experiment showed that QLTD achieved a compression ratio of 35.43×, with less than 1% loss in accuracy and a compression ratio of 90.61×, with less than a 2% loss in accuracy. QLTD was able to achieve a significant compression ratio in terms of the parameter count and realize a good balance between compressing parameters and maintaining identification accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call