Abstract

Residue number system (RNS) is known for its parallel arithmetic and has been used in recent decades in various important applications, from digital signal processing and deep neural networks to cryptography and high-precision computation. However, comparison, sign identification, overflow detection, and division are still hard to implement in RNS. For such operations, most of the methods proposed in the literature only support small dynamic ranges (up to several tens of bits), so they are only suitable for low-precision applications. We recently proposed a method that supports arbitrary moduli sets with cryptographically sized dynamic ranges, up to several thousands of bits. The practical interest of our method compared to existing methods is that it relies only on very fast standard floating-point operations, so it is suitable for multiple-precision applications and can be efficiently implemented on many general-purpose platforms that support IEEE 754 arithmetic. In this paper, we make further improvements to this method and demonstrate that it can successfully be applied to implement efficient data-parallel primitives operating in the RNS domain, namely finding the maximum element of an array of RNS numbers on graphics processing units. Our experimental results on an NVIDIA RTX 2080 GPU show that for random residues and a 128-moduli set with 2048-bit dynamic range, the proposed implementation reduces the running time by a factor of 39 and the memory consumption by a factor of 13 compared to an implementation based on mixed-radix conversion.

Highlights

  • The emergence of new highly parallel architectures has increased interest in fast, carry-free, and energy-efficient computer arithmetic techniques

  • We present several performance results of different approaches to finding the maximum element in an array of residue number system (RNS) numbers on the graphics processing units (GPUs):

  • Proposed approach is an implementation of the MAX operation as described in Section 5 using floating-point interval evaluations to compare the magnitude of RNS numbers

Read more

Summary

Introduction

The emergence of new highly parallel architectures has increased interest in fast, carry-free, and energy-efficient computer arithmetic techniques. In a recent paper [21], we have presented a method for implementing difficult RNS operations via computing a finite precision floating-point interval that localizes the fractional value of an RNS representation. The method leads to efficient software implementations using general-purpose computing platforms, since it only requires very fast standard (finite precision) floating-point operations, and most computations can be performed in a parallel manner. It is a fairly versatile method suitable for computing a wide range of fundamental operations that are problematic in RNS, including.

Residue Number System
Properties of the Interval Evaluation Algorithm
Description
Demonstration
Application
Approach
CUDA Implementation
Results and Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call