Sparse Matrix Compression Research Articles

This work introduces a new training and compression pipeline to build nested sparse convolutional neural networks (ConvNets), a class of dynamic ConvNets suited for inference tasks deployed on resource-constrained devices at the edge of the Internet of Things. A nested sparse ConvNet consists of a single ConvNet architecture, containing <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula> sparse subnetworks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at runtime, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weight subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 microcontroller unit (MCU), nested sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving 1) comparable accuracy; 2) remarkable storage savings; and 3) high performance. Moreover, when compared to state-of-the-art dynamic strategies, such as dynamic pruning and layer width scaling, nested sparse ConvNets turn out to be Pareto optimal in the accuracy versus latency space.

Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature maps. In this paper, we propose an efficient hardware accelerator with an interlayer feature compression technique to significantly reduce the required on-chip memory size and off-chip memory access bandwidth. The accelerator compresses interlayer feature maps through transforming the stored data into frequency domain using hardware-implemented <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$8\times 8$ </tex-math></inline-formula> discrete cosine transform (DCT). The high-frequency components are removed after the DCT through quantization. Sparse matrix compression is utilized to further compress the interlayer feature maps. The on-chip memory allocation scheme is designed to support dynamic configuration of the feature map buffer size and scratch pad size according to different network-layer requirements. The hardware accelerator combines compression, decompression, and CNN acceleration into one computing stream, achieving minimal compressing and processing delay. A prototype accelerator is implemented on an FPGA platform and also synthesized in TSMC 28-nm COMS technology. It achieves 403GOPS peak throughput and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.4\times \sim 3.3\times $ </tex-math></inline-formula> interlayer feature map reduction by adding light hardware area overhead, making it a promising hardware accelerator for intelligent IoT devices.

Sparse Matrix Compression Research Articles

Related Topics

Articles published on Sparse Matrix Compression

Dynamic ConvNets on Tiny Devices via Nested Sparsity

Accelerating AI Applications with Sparse Matrix Compression in Halide

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Cryptanalysis and improvement of a reversible data-hiding scheme in encrypted images by redundant space transfer

BLOCK ADJUSTMENT OF LARGE-SCALE HIGH-RESOLUTION OPTICAL SATELLITE IMAGERY WITHOUT GCPS BASED ON THE GPU

Bundle block adjustment of large-scale remote sensing data with Block-based Sparse Matrix Compression combined with Preconditioned Conjugate Gradient

FEM-Based Modeling and Deformation of Soft Tissue Accelerated by CUSPARSE and CUBLAS

Product or sum with transposed matrix: what is best for unsymmetric sparse matrix compression

Product or sum with transposed matrix: what is best for unsymmetric sparse matrix compression

An improvement to Ziegler's sparse matrix compression algorithm

A refinement of a compression-oriented addressing scheme

A letter‐oriented perfect hashing scheme based upon sparse table compression

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Sparse Matrix Compression Research Articles

Related Topics

Articles published on Sparse Matrix Compression

Dynamic ConvNets on Tiny Devices via Nested Sparsity

Accelerating AI Applications with Sparse Matrix Compression in Halide

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Cryptanalysis and improvement of a reversible data-hiding scheme in encrypted images by redundant space transfer

BLOCK ADJUSTMENT OF LARGE-SCALE HIGH-RESOLUTION OPTICAL SATELLITE IMAGERY WITHOUT GCPS BASED ON THE GPU

Bundle block adjustment of large-scale remote sensing data with Block-based Sparse Matrix Compression combined with Preconditioned Conjugate Gradient

FEM-Based Modeling and Deformation of Soft Tissue Accelerated by CUSPARSE and CUBLAS

Product or sum with transposed matrix: what is best for unsymmetric sparse matrix compression

Product or sum with transposed matrix: what is best for unsymmetric sparse matrix compression

An improvement to Ziegler's sparse matrix compression algorithm

A refinement of a compression-oriented addressing scheme

A letter‐oriented perfect hashing scheme based upon sparse table compression