Abstract

The modular multiplication (MM) is a key operation in cryptographic algorithms, such as RSA and elliptic-curve cryptography. Multicore processor is a suitable platform to implement MM because of its flexibility, high performance, and energy-efficiency. In this paper, we propose a block-level parallel algorithm for MM with quotient pipelining and optimally map it on a network-on-chip-based multicore platform equipped with broadcasting mechanism. Aiming at highest performance, a theoretical speedup model for parallel MM is also developed for parameter exploration that optimizes task partitioning. Experimental results based on a multicore prototype show that compared with the sequential MM on single core, the parallel implementation proposed in this paper maximizes the speedup ratio with regard to given intercore communication latency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call