SpMV Kernel Research Articles

The Sparse Matrix-Vector Multiplication (SpMV) kernel dominates the computing cost in numerous scientific applications. Many implementations based on different sparse formats were proposed to improve this kernel on the recent GPU architectures. However, it has been widely observed that there is no “best-for-all” sparse format for the SpMV kernel on GPU. Indeed, serious performance degradation of an order of magnitude can be observed without a careful selection of the sparse format to use. To address this problem, we propose in this article BestSF (Best Sparse Format), a new learning-based sparse meta-format that automatically selects the most appropriate sparse format for a given input matrix. To do so, BestSF relies on a cost-sensitive classification system trained using Weighted Support Vector Machines (WSVMs) to predict the best sparse format for each input sparse matrix. Our experimental results on two different NVIDIA GPU architectures using a large number of real-world sparse matrices show that BestSF achieved a noticeable overall performance improvement over using a single sparse format. While BestSF is trained to select the best sparse format in terms of performance (GFLOPS), our further experimental investigations revealed that using BestSF also led, in most of the test cases, to the best energy efficiency (MFLOPS/W). To prove its practical effectiveness, we also evaluate the performance and energy efficiency improvement achieved when using BestSF as a building block in a GPU-based Preconditioned Conjugate Gradient (PCG) iterative solver.

Read full abstract

SummaryThe sparse matrix‐vector multiplication (SpMV) is of great importance in scientific computations. Graphics processing unit (GPU)‐accelerated SpMVs for large‐sized problems have attracted considerable attention recently. We observe that on a specific multi‐GPU platform, the SpMV performance can usually be greatly improved when a matrix is partitioned into several blocks according to a predetermined rule and each block is assigned to a GPU with an appropriate storage format. This motivates us to propose a novel multi‐GPU parallel SpMV optimization framework, which involves the following parts: (1) a simple rule is defined to divide any given matrix among multiple GPUs; (2) a performance model, which is independent of the problems and dependent on the resources of devices, is proposed to accurately predict the execution time of SpMV kernels; and (3) a selection algorithm is suggested to automatically select the most appropriate one from the storage formats that are involved in the framework for the matrix block that is assigned to each GPU on the basis of the performance model. The objective of our framework does not construct a new storage format or algorithm but automatically and rapidly generates an optimally parallel SpMV for any sparse matrix on a specific multi‐GPU platform by integrating the existing storage formats and their corresponding kernels. We take 5 popular storage formats, for example, to present the idea of constructing the framework. Theoretically, we validate the correctness of our proposed SpMV performance model. This model is constructed only once for each type of GPU. Moreover, this framework is general and easy to be extensible. For a storage format that is not included in our framework, once the performance model of its corresponding SpMV kernel is successfully constructed, it can be incorporated into our framework. The experiments validate the efficiency of our proposed framework.

Read full abstract

SpMV Kernel Research Articles

Articles published on SpMV Kernel

Characterizing Scalability of Sparse Matrix–Vector Multiplications on Phytium FT-2000+

Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms

BestSF

Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU

AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL

Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions.

A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm

LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows

A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication

Adaptive Optimization Modeling of Preconditioned Conjugate Gradient on Multi-GPUs

A novel multi–graphics processing unit parallel optimization framework for the sparse matrix‐vector multiplication

Performance optimization of Sparse Matrix‐Vector Multiplication for multi‐component PDE‐based applications using GPUs

Optimization techniques for sparse matrix–vector multiplication on GPUs

A Performance Prediction and Analysis Integrated Framework for SpMV on GPUs

A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs

Accurate cross‒architecture performance modeling for sparse matrix‒vector multiplication (SpMV) on GPUs

Sparse matrix–vector multiplication on the Single-Chip Cloud Computer many-core processor

GPU-accelerated preconditioned iterative linear solvers

An Improved Implementation of Preconditioned Conjugate Gradient Method on GPU

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

SpMV Kernel Research Articles

Articles published on SpMV Kernel

Characterizing Scalability of Sparse Matrix–Vector Multiplications on Phytium FT-2000+

Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms

BestSF

Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU

AXC: A new format to perform the SpMV oriented to Intel Xeon Phi architecture in OpenCL

Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions.

A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm

LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows

A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication

Adaptive Optimization Modeling of Preconditioned Conjugate Gradient on Multi-GPUs

A novel multi–graphics processing unit parallel optimization framework for the sparse matrix‐vector multiplication

Performance optimization of Sparse Matrix‐Vector Multiplication for multi‐component PDE‐based applications using GPUs

Optimization techniques for sparse matrix–vector multiplication on GPUs

A Performance Prediction and Analysis Integrated Framework for SpMV on GPUs

A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs

Accurate cross‒architecture performance modeling for sparse matrix‒vector multiplication (SpMV) on GPUs

Sparse matrix–vector multiplication on the Single-Chip Cloud Computer many-core processor

GPU-accelerated preconditioned iterative linear solvers

An Improved Implementation of Preconditioned Conjugate Gradient Method on GPU