Accelerating Multiple Precision Multiplication in GPU with Kepler Architecture

Boon-Chiao Chang,Wai-Kong Lee,Bok-Min Goi,Raphael C.-W Phan

doi:10.1109/hpcc-smartcity-dss.2016.0122

Abstract

Multiple precision multiplication is widely used in scientific computing and cryptography. When the size of integer grows beyond computer precision (32-bit or 64-bit), the computational cost of multiplication becomes significant. In this paper, we proposed a novel solution to implement multiple precision multiplication in massively parallel GPU with Kepler architecture. Our implementation is designed based on Chinese Remainder Theorem and Number Theoretic Transform with 64-bit prime. We implemented three versions of multiple precision multiplication which utilized global memory, shared memory and registers to store the precomputed twiddle factors. The register version use warp shuffle instruction (available in GPU with Kepler architecture) to exchange data among threads within the same warp. Thist echnique is able to avoid bank conflict issue in shared memory and allow faster computation in GPU. To the best of our knowledge, this is the first implementation reported in the literature that utilized warp shuffle instruction to accelerate NTT computation. Our best implementation is able to perform 1024-bit, 2048-bit, 4096-bit and 8192-bit multiplication in 0.095ms, 0.169ms, 0.444ms and 1.113ms respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Accelerating Multiple Precision Multiplication in GPU with Kepler Architecture

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme
Xiangren Chen ... Leibo Liu
IACR Transactions on Cryptographic Hardware and Embedded Systems | VOL. -
Xiangren Chen, et. al.Xiangren Chen ... Leibo Liu
19 Nov 2021
IACR Transactions on Cryptographic Hardware and Embedded Systems | VOL. -

Faster Bootstrapping via Modulus Raising and Composite NTT
Zhihao Li ... Ying Liu
IACR Transactions on Cryptographic Hardware and Embedded Systems | VOL. 2024
Zhihao Li, et. al.Zhihao Li ... Ying Liu
04 Dec 2023
IACR Transactions on Cryptographic Hardware and Embedded Systems | VOL. 2024

An Area-Efficient and Configurable Number Theoretic Transform Accelerator for Homomorphic Encryption
Jingwen Huang ... Tao Su
Electronics | VOL. 13
Jingwen Huang, et. al.Jingwen Huang ... Tao Su
26 Aug 2024
Electronics | VOL. 13

Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUs
Sangpyo Kim ... Wonkyung Jung
-
Sangpyo Kim, et. al.Sangpyo Kim ... Wonkyung Jung
01 Oct 2020
01 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerating Multiple Precision Multiplication in GPU with Kepler Architecture

Abstract

Talk to us

Similar Papers