Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions.

Bérenger Bramas,Pavel Kus

doi:10.7717/peerj-cs.151

Bérenger Bramas, Pavel Kus

Open Access

https://doi.org/10.7717/peerj-cs.151

Copy DOI

Journal: PeerJ Computer Science	Publication Date: Apr 30, 2018
Citations: 10	License type: CC BY 4.0

Affiliation: Max Planck Computing and Data Facility

Abstract

The sparse matrix-vector product (SpMV) is a fundamental operation in many scientific applications from various fields. The High Performance Computing (HPC) community has therefore continuously invested a lot of effort to provide an efficient SpMV kernel on modern CPU architectures. Although it has been shown that block-based kernels help to achieve high performance, they are difficult to use in practice because of the zero padding they require. In the current paper, we propose new kernels using the AVX-512 instruction set, which makes it possible to use a blocking scheme without any zero padding in the matrix memory storage. We describe mask-based sparse matrix formats and their corresponding SpMV kernels highly optimized in assembly language. Considering that the optimal blocking size depends on the matrix, we also provide a method to predict the best kernel to be used utilizing a simple interpolation of results from previous executions. We compare the performance of our approach to that of the Intel MKL CSR kernel and the CSR5 open-source package on a set of standard benchmark matrices. We show that we can achieve significant improvements in many cases, both for sequential and for parallel executions. Finally, we provide the corresponding code in an open source library, called SPC5.

Highlights

The sparse matrix-vector product (SpMV) is an important operation in many applications, which often needs to be performed multiple times in the course of the algorithm
Since in many scientific applications a large part of the CPU time is spent in the solution of the resulting linear system and the matrix is stored in a sparse manner, improving the efficiency of the SpMV on modern hardware could potentially leverage the performance of a wide range of codes
In ‘SpMV Kernels for β(r,c) storages’, we described generic kernels, which can be used with any block sizes

Summary

Introduction

The sparse matrix-vector product (SpMV) is an important operation in many applications, which often needs to be performed multiple times in the course of the algorithm. With the effective halt in the growth of processor frequencies, most of the increase of computing power is achieved through increasing the number of cores and the ability of each individual core to perform each operation on a vector of certain length using one instruction only. This capability is named vectorization or single instruction on multiple data (SIMD)

Objectives

Methods

Findings

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ Computer Science

Lead the way for us

Similar Papers

Optimization of the Sparse Matrix-Vector Products of an IDR Krylov Iterative Solver in EMGeo for the Intel KNL Manycore Processor
Tareq Malas ... Thorsten Kurth
-
Tareq Malas, et. al.Tareq Malas ... Thorsten Kurth
01 Jan 2015
01 Jan 2015

Implementation and Evaluation of Parallel Sparse Matrix-Vector Products on Distributed Memory Parallel Computers
Rukhsana Shahnaz ... Imran Chughtai
-
Rukhsana Shahnaz, et. al.Rukhsana Shahnaz ... Imran Chughtai
01 Jan 2006
01 Jan 2006

Optimizing Sparse Matrix–Vector Product Computations Using Unroll and Jam
John Mellor-Crummey ... John Garvin
The International Journal of High Performance Computing Applications | VOL. 18
John Mellor-Crummey, et. al.John Mellor-Crummey ... John Garvin
01 May 2004
The International Journal of High Performance Computing Applications | VOL. 18

Taming irregular EDA applications on GPUs
Yangdong (Steve) Deng ... Bo David Wang
-
Yangdong (Steve) Deng, et. al.Yangdong (Steve) Deng ... Bo David Wang
02 Nov 2009
02 Nov 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ Computer Science