Compression of Deep Neural Networks based on quantized tensor decomposition to implement on reconfigurable hardware platforms

Saeed Safari,Amirreza Nekooei

doi:10.1016/j.neunet.2022.02.024

Abstract

Deep Neural Networks (DNNs) have been vastly and successfully employed in various artificial intelligence and machine learning applications (e.g., image processing and natural language processing). As DNNs become deeper and enclose more filters per layer, they incur high computational costs and large memory consumption to preserve their large number of parameters. Moreover, present processing platforms (e.g., CPU, GPU, and FPGA) have not enough internal memory, and hence external memory storage is needed. Hence deploying DNNs on mobile applications is difficult, considering the limited storage space, computation power, energy supply, and real-time processing requirements. In this work, using a method based on tensor decomposition, network parameters were compressed, thereby reducing access to external memory. This compression method decomposes the network layers’ weight tensor into a limited number of principal vectors such that (i) almost all the initial parameters can be retrieved, (ii) the network structure did not change, and (iii) the network quality after reproducing the parameters was almost similar to the original network in terms of detection accuracy. To optimize the realization of this method on FPGA, the tensor decomposition algorithm was modified while its convergence was not affected, and the reproduction of network parameters on FPGA was straightforward. The proposed algorithm reduced the parameters of ResNet50, VGG16, and VGG19 networks trained with Cifar10 and Cifar100 by almost 10 times.

Full Text