Sparse Storage Format Research Articles

Network sparsification serves as an effective technique to accelerate Deep Neural Network (DNN) inference. However, existing sparsification techniques often rely on structured sparsity, which yields limited benefits. This is primarily due to the significant memory and computational overhead introduced by numerous sparse storage formats during address generation and gradient updates. Additionally, many of these solutions are tailored solely for the inference phase, neglecting the crucial training phase. In this paper, we introduce STCO, a novel Sparse Tensor Compilation Optimization technique that significantly enhances training efficiency through structured sparse tensor compilation. Central to STCO is the Tensorization-aware Index Entity (TIE) format, which effectively represents structured sparse tensors by eliminating redundant indices and minimizing storage overhead. The TIE format plays a pivotal role in the Address-carry flow (AC flow) pass, which optimizes the data layout at the computational graph level. This pass leverages the TIE format to enhance the efficiency of tensor representations, enabling more compact and efficient sparse tensor storage. Meanwhile, a shape inference pass utilizes the AC flow to derive optimized tensor shapes, further refining the performance of sparse tensor operations. Moreover, the Address-Carry TIE Flow dynamically tracks nonzero addresses, extending the benefits of sparse optimization to both forward and backward propagation. This seamless integration into the training pipeline enables a smooth transition to sparse tensor compilation without significant modifications to existing codebases. To further boost training performance, we implement an operator-level AC flow optimization pass tailored for structured sparse tensors. This pass generates efficient addresses, ensuring minimal computational overhead during sparse tensor operations. The flexibility of STCO allows it to be efficiently integrated into various frameworks or compilers, providing a robust solution for enhancing training efficiency with structured sparse tensors. Experiments demonstrated that STCO achieved impressive speedups of 3.64 ×, 5.43 ×, 4.89 ×, and 3.91 × when compared to state-of-the-art sparse formats on VGG16, ResNet-18, MobileNetV1, and MobileNetV2, respectively. These findings underscore the efficiency and superiority of our proposed approach in leveraging unstructured sparsity for Deep Neural Network inference acceleration.

Read full abstract

The C++ language is often used for implementing functionality that is performance and/or resource sensitive. While the standard C++ library provides many useful algorithms (such as sorting), in its current form it does not provide direct handling of linear algebra (matrix maths). Armadillo is an open source linear algebra library for the C++ language, aiming towards a good balance between speed and ease of use. Its high-level Application Programming Interface (API) is deliberately similar to the widely Matlab and Octave languages (Eaton et al. 2015), so that mathematical operations can be expressed in a familiar and natural manner. The library is useful for algorithm development directly in C++, or relatively quick conversion of research code into production environments. Armadillo provides efficient objects for vectors, matrices and cubes (third order tensors), as well as over 200 associated functions for manipulating data stored in the objects. Integer, floating point and complex numbers are supported, as well as dense and sparse storage formats. Various matrix factorisations are provided through integration with LAPACK (Demmel 1997), or one of its high performance drop-in replacements such as Intel MKL (Intel 2016) or OpenBLAS (Xianyi, Qian, and Saar 2016). It is also possible to use Armadillo in conjunction with NVBLAS to obtain GPU-accelerated matrix multiplication (NVIDIA 2015). Armadillo is used as a base for other open source projects, such as MLPACK, a C++ library for machine learning and pattern recognition (Curtin et al. 2013), and RcppArmadillo, a bridge between the R language and C++ in order to speed up computations (Eddelbuettel and Sanderson 2014). Armadillo internally employs an expression evaluator based on template metaprogramming techniques (Abrahams and Gurtovoy 2004), to automatically combine several operations in order to increase speed and efficiency. An overview of the internal architecture is given in (Sanderson 2010).

Read full abstract

Sparse Storage Format Research Articles

Related Topics

Articles published on Sparse Storage Format

STCO: Enhancing Training Efficiency via Structured Sparse Tensor Compilation Optimization

Tuning high-level synthesis SpMV kernels in Alveo FPGAs

A Hybrid Sparse-dense Defensive DNN Accelerator Architecture against Adversarial Example Attacks

Leveraging index compression techniques to optimize the use of co-processors

An efficient framework for matrix-free SpMV computation on GPU for elastoplastic problems

Adaptive diagonal sparse matrix-vector multiplication on GPU

Symbolic and Numeric Kernel Division for Graphics Processing Unit-Based Finite Element Analysis Assembly of Regular Meshes With Modified Sparse Storage Formats

An unstructured mesh control volume method for two-dimensional space fractional diffusion equations with variable coefficients on convex domains

Armadillo: a template-based C++ library for linear algebra

Tree-based Space Efficient Formats for Storing the Structure of Sparse Matrices

Performance evaluation of sparse matrix products in UPC

Implementation Procedures of Parallel Preconditioning with Sparse Matrix Based on FEM

An efficient version of the RMA-11 model

Effect of the storage format of sparse linear systems on parallel CFD computations

ILUS: An incomplete LU preconditioner in sparse skyline format

ILUS: An incomplete LU preconditioner in sparse skyline format

A parallel implementation of tight-binding molecular dynamics

A Parallel Implementation of Tight-Binding Molecular Dynamics Based on Reordering of Atoms and the Lanczos Eigen-Solver

A finite element solutionof three‐dimensional mixed convection gas flows in horizontal chnnels using preconditioned iterative metrix methods

High‐performance equation solvers and their impact on finite element analysis

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Sparse Storage Format Research Articles

Related Topics

Articles published on Sparse Storage Format

STCO: Enhancing Training Efficiency via Structured Sparse Tensor Compilation Optimization

Tuning high-level synthesis SpMV kernels in Alveo FPGAs

A Hybrid Sparse-dense Defensive DNN Accelerator Architecture against Adversarial Example Attacks

Leveraging index compression techniques to optimize the use of co-processors

An efficient framework for matrix-free SpMV computation on GPU for elastoplastic problems

Adaptive diagonal sparse matrix-vector multiplication on GPU

Symbolic and Numeric Kernel Division for Graphics Processing Unit-Based Finite Element Analysis Assembly of Regular Meshes With Modified Sparse Storage Formats

An unstructured mesh control volume method for two-dimensional space fractional diffusion equations with variable coefficients on convex domains

Armadillo: a template-based C++ library for linear algebra

Tree-based Space Efficient Formats for Storing the Structure of Sparse Matrices

Performance evaluation of sparse matrix products in UPC

Implementation Procedures of Parallel Preconditioning with Sparse Matrix Based on FEM

An efficient version of the RMA-11 model

Effect of the storage format of sparse linear systems on parallel CFD computations

ILUS: An incomplete LU preconditioner in sparse skyline format

ILUS: An incomplete LU preconditioner in sparse skyline format

A parallel implementation of tight-binding molecular dynamics

A Parallel Implementation of Tight-Binding Molecular Dynamics Based on Reordering of Atoms and the Lanczos Eigen-Solver

A finite element solutionof three‐dimensional mixed convection gas flows in horizontal chnnels using preconditioned iterative metrix methods

High‐performance equation solvers and their impact on finite element analysis