Abstract

Many large problems need linear algebra operations with a precision exceeding the standard floating-point binary64 format. In this paper, we implement a multiple-precision scaled vector addition BLAS routine (WAXPBY) on graphics processing units. We use a residue number system (RNS) to represent significands of floating-point values. In RNS, large numbers replace with their residues and the operations of addition, subtraction and multiplication perform on these residues in parallel and without carry propagation. Our parallel WAXPBY algorithm is divided into a number of steps, and each step is carried out by a separate GPU kernel. Experiments show that the developed routine clearly outperforms parallel CPU-based multiple-precision implementations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call