Deep neural network compression by Tucker decomposition with nonlinear response

Ye Liu,Michael K Ng

doi:10.1016/j.knosys.2022.108171

Abstract

Deep neural networks have shown impressive performance in many areas, including computer vision and natural language processing. Millions of parameters in deep neural network limit its deployment in low-end devices due to intensive memory and expensive computational cost. In the literature, several network compression techniques based on tensor decompositions have been proposed to compress deep neural networks. Existing techniques are designed in each network unit by approximating linear response or kernel tensor using various tensor decomposition methods. What is more, research has shown that there exists significant redundancy between different filters and feature channels of kernel tensor in each convolution layer. In this paper, we propose a new algorithm to compress deep neural network by considering both nonlinear response and the multilinear low-rank constraint in the kernel tensor. To overcome the resulted difficulty of nonconvex optimization, we propose a convex relaxation scheme such that it can be solved by alternating direction method of multipliers (ADMM) directly. Thus, the Tucker-2 rank and the feature matrix of Tucker decomposition can be determined simultaneously. The effectiveness of the proposed method is evaluated on CIFAR-10 and large-scale ILSVRC12 datasets for CNNs including ResNet-18, AlexNet and GoogleNet. According to our numerical computation, the proposed method is able to obtain highly reduction in model size with a small loss in accuracy. The compression performance of the proposed method is better than existing methods.

Full Text