Multiple-precision sparse matrix–vector multiplication on GPUs

Konstantin Isupov

doi:10.1016/j.jocs.2022.101609

Abstract

Sparse matrix–vector multiplication (SpMV) appears in many application domains, and performance is the key consideration when implementing SpMV kernels. At the same time, accuracy is also important because rounding errors can drastically change the computed result. Multiple-precision arithmetic is a common approach to improve the accuracy of results. In this paper, we implement and evaluate multiple-precision SpMV kernels in the CSR, JAD, ELLPACK, and DIA matrix storage formats for graphics processing units. In the proposed implementation, the matrix is represented in double precision, while the input and output vectors are in multiple precision and internal computations are also performed in multiple precision. Our underlying floating-point arithmetic algorithms are based on the residue number system, which is attractive because of its carry-free nature and provides an arbitrary level of precision determined by the set of moduli that comprise the base of the system. In particular, we apply sets from 8 to 64 moduli, reaching precision values from 106 to 848 bits, and demonstrate how higher precision reduces the rounding errors in the SpMV kernel. We also conduct a thorough analysis of the CSR kernel in terms of roofline performance, occupancy and memory bandwidth, and evaluate the performance and memory consumption for all the proposed kernels. Numerical experiments on matrices from real-world applications show that ELLPACK offers the highest performance in many examples, while JAD is a good trade-off between performance and space requirements. Furthermore, our proposed kernels in general substantially improve the efficiency of sparse matrix–vector multiplication compared to implementations built on top of existing multiple-precision CUDA libraries. Finally, we integrate the multiple-precision SpMV into a preconditioned conjugate gradient linear solver and identify test cases where our implementation exhibits superior convergence and numerical robustness.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multiple-precision sparse matrix–vector multiplication on GPUs

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Science

Lead the way for us

Journal: Journal of Computational Science	Publication Date: Feb 25, 2022
Citations: 1

Similar Papers

Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid
Guy E Blelloch ... Kanat Tangwongsan
-
Guy E Blelloch, et. al.Guy E Blelloch ... Kanat Tangwongsan
01 Nov 2010
01 Nov 2010

High Precision Multiplier for RNS {2n−1,2n,2n+1}
Shang Ma ... Zeguo Yang
Electronics | VOL. 10
Shang Ma, et. al.Shang Ma ... Zeguo Yang
08 May 2021
High Precision Multiplier for RNS {2n−1,2n,2n+1}
Shang Ma ... Zeguo Yang

A new approach for high performance RNS-FIR filter using the moduli set {2<sup>k</sup> − 1, 2<sup>k</sup>, 2<sup>k−1</sup> − 1}
Srinivasa Reddy Kotha ... S K Sahoo
-
Srinivasa Reddy Kotha, et. al.Srinivasa Reddy Kotha ... S K Sahoo
01 Apr 2014
01 Apr 2014

Study of the Reverse Converters for the Large Dynamic Range Four-Moduli Sets
Amir Sabbagh ... Keivan Navi
-
Amir Sabbagh, et. al.Amir Sabbagh ... Keivan Navi
23 Nov 2011
23 Nov 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multiple-precision sparse matrix–vector multiplication on GPUs

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Science