Design and implementation of multiple-precision BLAS Level 1 functions for graphics processing units

Konstantin Isupov,Vladimir Knyazkov,Alexander Kuvaev

doi:10.1016/j.jpdc.2020.02.006

Abstract

Basic Linear Algebra Subprograms (BLAS) are the building blocks for various numerical algorithms and are widely used in scientific computations. However, some linear algebra applications need more precision than the standard double precision available in most existing BLAS libraries. In this paper, we implement and evaluate multiple-precision scalar and vector BLAS functions on graphics processing units (GPUs). We use the residue number system (RNS) to represent arbitrary length floating-point numbers. The non-positional nature of RNS enables parallelism in multiple-precision arithmetic and makes RNS a good tool for high-performance computing applications. We first present new data-parallel algorithms for multiplying and adding RNS-based floating-point representations. Next, we suggest algorithms for multiple-precision vectors specially designed for parallel computations on GPUs. Using these algorithms, we develop and evaluate four GPU-accelerated multiple-precision BLAS functions, ASUM, DOT, SCAL, and AXPY. It is shown through experiments that in many cases, the implemented functions achieve significantly better performance compared to existing multiple-precision software for CPU and GPU.

Full Text