Multiple-precision matrix-vector multiplication on graphics processing units

Konstantin Isupov,Vladimir Knyazkov,Sergey Anatoljevich Romanenko

doi:10.25209/2079-3316-2020-11-3-61-84

Abstract

We are considering a parallel implementation of matrix-vector multiplication (GEMV, Level 2 of the BLAS) for graphics processing units (GPUs) using multiple-precision arithmetic based on the residue number system. In our GEMV implementation, element-wise operations with multiple-precision vectors and matrices consist of several parts, each of which is calculated by a separate CUDA kernel. This feature eliminates branch divergence when performing sequential parts of multiple-precision operations and allows the full utilization of the GPU’s resources. An efficient data structure for storing arrays with multiple-precision entries provides a coalesced access pattern to the GPU global memory. We have performed a rounding error analysis and derived error bounds for the proposed GEMV implementation. Experimental results show the high efficiency of the proposed solution compared to existing high-precision packages deployed on GPU.

Highlights

A separate Compute Unified Device Architecture (CUDA) kernel performs each piece with its configuration; all digits of multiple-precision numbers are calculated in parallel
Our experiments have shown that, in many cases, MPRES-Basic Linear Algebra Subprogram (BLAS) has better performance than implementations based on existing high-precision packages for central processing units (CPUs) and graphics processing units (GPUs)
We have presented a parallel implementation of the multiple-precision GEMV operation for systems with CUDA-compatible GPUs

Summary

Introduction

Floating-point operations have rounding errors that occur directly during calculations. A separate CUDA kernel performs each piece with its configuration; all digits of multiple-precision numbers are calculated in parallel This approach leads to an increase in the number of global memory accesses. It provides high performance and good scalability of computations with high precision on GPUs compared to the traditional paradigm where each multiple-precision arithmetic operation is performed as a single thread. To implement this approach, we use the residue number system (RNS) [12]. Conclusions and further research are presented in the last section of the paper

High-Precision Computations and BLAS for GPU

Representation of arbitrary length floating-point numbers using RNS

Data layout

Algorithms for implementing GEMV on GPUs

The case of non-transposed matrix

The case of transposed matrix

Accuracy evaluation

Performance results

Performance of individual CUDA kernels

Comparison with other implementations

Findings

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Программные системы: теория и приложения	Publication Date: Aug 20, 2020
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multiple-precision matrix-vector multiplication on graphics processing units

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Программные системы: теория и приложения

Lead the way for us

Similar Papers

Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware
Owen Harrison ... John Waldron
-
Owen Harrison, et. al.Owen Harrison ... John Waldron
01 Jan 2009
01 Jan 2009

Multiple-Precision Scaled Vector Addition on Graphics Processing Unit
Konstantin Isupov ... Alexander Kuvaev
-
Konstantin Isupov, et. al.Konstantin Isupov ... Alexander Kuvaev
01 Jan 2019
01 Jan 2019

Design and implementation of multiple-precision BLAS Level 1 functions for graphics processing units
Konstantin Isupov ... Alexander Kuvaev
Journal of Parallel and Distributed Computing | VOL. 140
Konstantin Isupov, et. al.Konstantin Isupov ... Alexander Kuvaev
19 Feb 2020
Journal of Parallel and Distributed Computing | VOL. 140

Ballooning Graphics Memory Space in Full GPU Virtualization Environments
Younghun Park ... Sungyong Park
Scientific programming | VOL. 2019
Younghun Park, et. al.Younghun Park ... Sungyong Park
23 Apr 2019
Scientific programming | VOL. 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multiple-precision matrix-vector multiplication on graphics processing units

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Программные системы: теория и приложения