Sparse Matrix Computations Research Articles

Sparse matrix computation is crucial in various modern applications, including large-scale graph analytics, deep learning, and recommender systems. The performance of sparse kernels varies greatly depending on the structure of the input matrix, making it difficult to gain a comprehensive understanding of sparse computation and its relationship to inputs, algorithms, and target machine architecture. Despite extensive research on certain sparse kernels, such as Sparse Matrix-Vector Multiplication (SpMV), the overall family of sparse algorithms has yet to be investigated as a whole. This paper introduces SpChar, a workload characterization methodology for general sparse computation. SpChar employs tree-based models to identify the most relevant hardware and input characteristics, starting from hardware and input-related metrics gathered from Performance Monitoring Counters (PMCs) and matrices. Our analysis enables the creation of a characterization loop that facilitates the optimization of sparse computation by mapping the impact of architectural features to inputs and algorithmic choices. We apply SpChar to more than 600 matrices from the SuiteSparse Matrix collection and three state-of-the-art Arm Central Processing Units (CPUs) to determine the critical hardware and software characteristics that affect sparse computation. In our analysis, we determine that the biggest limiting factors for high-performance sparse computation are (1) the latency of the memory system, (2) the pipeline flush overhead resulting from branch misprediction, and (3) the poor reuse of cached elements. Additionally, we propose software and hardware optimizations that designers can implement to create a platform suitable for sparse computation. We then investigate these optimizations using the gem5 simulator to achieve a significant speedup of up to 2.63× compared to a CPU where the optimizations are not applied.

Read full abstract

Loop tiling and fusion are two essential transformations in optimizing compilers to enhance the data locality of programs. Existing heuristics either perform loop tiling and fusion in a particular order, missing some of their profitable compositions, or execute ad-hoc implementations for domain-specific applications, calling for a generalized and systematic solution in optimizing compilers. In this article, we present a so-called basteln (an abbreviation for backward slicing of tiled loop nests) strategy in polyhedral compilation to better model the interplay between loop tiling and fusion. The basteln strategy first groups loop nests by preserving their parallelism/tilability and next performs rectangular/parallelogram tiling to the output groups that produce data consumed outside the considered program fragment. The memory footprints required by each tile are then computed, from which the upward exposed data are extracted to determine the tile shapes of the remaining fusion groups. Such a tiling mechanism can construct complex tile shapes imposed by the dependences between these groups, which are further merged by a post-tiling fusion algorithm for enhancing data locality without losing the parallelism/tilability of the output groups. The basteln strategy also takes into account the amount of redundant computations and the fusion of independent groups, exhibiting a general applicability. We integrate the basteln strategy into two optimizing compilers, with one a general-purpose optimizer and the other a domain-specific compiler for deploying deep learning models. The experiments are conducted on CPU, GPU, and a deep learning accelerator to demonstrate the effectiveness of the approach for a wide class of application domains, including deep learning, image processing, sparse matrix computation, and linear algebra. In particular, the basteln strategy achieves a mean speedup of 1.8× over cuBLAS/cuDNN and 1.1× over TVM on GPU when used to optimize deep learning models; it also outperforms PPCG and TVM by 11% and 20%, respectively, when generating code for the deep learning accelerator.

Read full abstract

Sparse Matrix Computations Research Articles

Related Topics

Articles published on Sparse Matrix Computations

Bandwidth of WK-recursive networks and its sparse matrix computation

SpChar: Characterizing the sparse puzzle via decision trees

Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUs

Numerical Solution of Two-Dimensional Nonlinear Unsteady Advection-Diffusion-Reaction Equations with Variable Coefficients

Advancing Finite Difference Solutions for Two‐Dimensional Incompressible Navier–Stokes Equations Using Artificial Compressibility Method and Sparse Matrix Computation

Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine Relations

A Tiny Accelerator for Mixed-Bit Sparse CNN Based on Efficient Fetch Method of SIMO SPad

Mapping tree‐shaped workflows on systems with different memory sizes and processor speeds

Editorial for the special issue on architecture, algorithms and applications of high performance sparse matrix computations

Longitudinal Modeling of Age-Dependent Latent Traits with Generalized Additive Latent and Mixed Models

Parallel Sparse Computation Toolkit

Inverting the discrete curl operator: A novel graph algorithm to find a vector potential of a given vector field

Validation of elastic wave arrival detection method based on use of sparse matrix computation

Scalable Predictions for Spatial Probit Linear Mixed Models Using Nearest Neighbor Gaussian Processes.

Coarse-Grained Pruning of Neural Network Models Based on Blocky Sparse Structure.

Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.

AMG based on compatible weighted matching for GPUs

The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code

On Improving Computational Efficiency of Simplified Fluid Flow Models

Sparse matrix computation for air quality forecast data assimilation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Sparse Matrix Computations Research Articles

Related Topics

Articles published on Sparse Matrix Computations

Bandwidth of WK-recursive networks and its sparse matrix computation

SpChar: Characterizing the sparse puzzle via decision trees

Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUs

Numerical Solution of Two-Dimensional Nonlinear Unsteady Advection-Diffusion-Reaction Equations with Variable Coefficients

Advancing Finite Difference Solutions for Two‐Dimensional Incompressible Navier–Stokes Equations Using Artificial Compressibility Method and Sparse Matrix Computation

Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine Relations

A Tiny Accelerator for Mixed-Bit Sparse CNN Based on Efficient Fetch Method of SIMO SPad

Mapping tree‐shaped workflows on systems with different memory sizes and processor speeds

Editorial for the special issue on architecture, algorithms and applications of high performance sparse matrix computations

Longitudinal Modeling of Age-Dependent Latent Traits with Generalized Additive Latent and Mixed Models

Parallel Sparse Computation Toolkit

Inverting the discrete curl operator: A novel graph algorithm to find a vector potential of a given vector field

Validation of elastic wave arrival detection method based on use of sparse matrix computation

Scalable Predictions for Spatial Probit Linear Mixed Models Using Nearest Neighbor Gaussian Processes.

Coarse-Grained Pruning of Neural Network Models Based on Blocky Sparse Structure.

Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.

AMG based on compatible weighted matching for GPUs

The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code

On Improving Computational Efficiency of Simplified Fluid Flow Models

Sparse matrix computation for air quality forecast data assimilation