The proliferation of deep learning algorithms has catalyzed their utilization to solve a multitude of real-world problems. Algorithms such as deep neural networks (DNNs) are compute- and power-intensive, thereby accentuating the development of hardware platforms like DNN inference accelerators. However, inference execution of large DNNs in resource-constrained environments induces energy bottlenecks in these accelerators. Since large DNNs consist of hundreds of millions of trained parameters, accessing them from the accelerator memory incurs substantial energy. To address this challenge, we propose HardCompress, which, to the best of our knowledge, is the first low-power solution that uses traditional compression strategies pertaining to commercial DNN accelerators in resource-constrained IoT edge devices. The three-step approach involves hardware-based post-quantization trimming of weights, followed by their dictionary-based compression and subsequent decompression by a low-power hardware engine during inference in the accelerator. We evaluate the proposed solution on lightweight networks trained on the MNIST dataset, the compact model trained on the CIFAR-10 dataset, and large DNNs trained on the ImageNet dataset. Performance of HardCompress at different quantization levels has been analyzed. Furthermore, to quantify the effectiveness of the proposed solution, an energy framework that contrasts the DRAM energies of the original and HardCompressed models has been developed. Finally, a fault injection framework which compares the fault resilience of the original model with its HardCompressed counterpart is also proposed. Our results exhibit that HardCompress, without any performance degradation in large DNNs, furnishes a maximum compression of 99.27%, equivalent to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$137\times $ </tex-math></inline-formula> reduction in memory footprint and 0.07 J for 8-bit quantization in the systolic array-based DNN accelerator. Furthermore, our proposed low-power decompression engine incurs an area overhead of only 0.02%; thus, enabling HardCompress’ utilization in resource-constrained environments.
Read full abstract