Abstract

This paper reports a constant-time CPU and GPU software implementation of the RSA exponentiation by using algorithms that offer a first-line defense against timing and cache attacks. In the case of GPU platforms the modular arithmetic layer was implemented using the Residue Number System (RNS) representation. We also present a CPU implementation of an RNS-based arithmetic that takes advantage of the parallelism provided by the Advanced Vector Extensions 2 (AVX2) instructions. Moreover, we carefully analyze the performance of two popular RNS modular reduction algorithms when implemented on many- and multi-core platforms. In the case of CPU platforms we also report that a combination of the schoolbook and Karatsuba algorithms for integer multiplication along with Montgomery reduction, yields our fastest modular multiplication procedure. In comparison with previous literature, our software library achieves faster timings for the computation of the RSA exponentiation using 1024-, 2048- and 3072-bit private keys.

Highlights

  • Public key cryptosystems play an important role in communication systems that require the exchange of sensitive information

  • We focus our attention on the efficient parallel computation of s1 and s2 in General Processing Units (GPUs) and Central Processing Units (CPUs) software implementations

  • 1) VECTOR INSTRUCTIONS In order to perform an efficient implementation of the Residue Number System (RNS) based arithmetic as described in Section §II-C, we took advantage of the Advanced Vector Extensions 2 (AVX2) instruction set introduced in the Intel Haswell micro-architecture [31]

Read more

Summary

INTRODUCTION

Public key cryptosystems play an important role in communication systems that require the exchange of sensitive information. We focus our attention on the efficient parallel computation of s1 and s2 in GPU and CPU software implementations. On the other hand, taking advantage of their massive parallelism, General Processing Units (GPUs) platforms have become an interesting option to speedup high demanding computational tasks such as the computation of several public key cryptographic primitives. OUR CONTRIBUTIONS: In this work, two RSA constanttime software implementations for 1024-, 2048-, and 3072-bit RSA keys, are presented. Our CPU software implementation of RSA uses a combination of integer arithmetic algorithms and Montgomery reduction that helped us to exploit the fine-grained parallelism present in the latest Intel micro-architectures. The experimental results presented in this work outperform previously reported GPU RSA implementations [7]–[10] by a factor of 1.24, 1.27 and 2.98 for RSA-1024 bits, RSA-2048 bits, and RSA-3072, respectively.

ARITHMETIC BACKGROUND
CONSTANT-TIME MODULAR EXPONENTIATION
MONTGOMERY MODULAR ARITHMETIC
RNS MODULAR ARITHMETIC
RNS product Addition of σ -bit terms 16: for each processing unit i do
EFFICIENT IMPLEMENTATION OF RSA ON GPU PLATFORMS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call