Abstract

The Montgomery Multiplication is one of the cornerstones of public-key cryptography, with important applications in the RSA algorithm, in Elliptic-Curve Cryptography, and in the Digital Signature Standard. The efficient implementation of this long-word-length modular multiplication is crucial for the performance of public-key cryptography. Along with the strong momentum of shifting from single-core to multicore systems, we present a parallel-software implementation of the Montgomery multiplication for multicore systems. Our comprehensive analysis shows that the proposed scheme, pSHS, partitions the task in a balanced way so that each core has the same amount of job to do. In addition, we also comprehensively analyze the impact of intercore communication overhead on the performance of pSHS. The analysis reveals that pSHS is high performance, scalable over different number of cores, and stable when the communication latency changes. The analysis also tells us how to set different parameters to achieve the optimal performance. We implemented pSHS on a prototype multicore architecture configured in a Field Programmable Gate Array (FPGA). Compared with the sequential implementation, pSHS accelerates 2,048-bit Montgomery multiplication by 1.97, 3.68, and 6.13 times on, respectively, two-core, four-core, and eight-core architectures with communication latency equal to 100 clock cycles.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.