Abstract

One of the key points of success in high performance computation using an FPGA is the efficient usage of DSP slices and block RAMs in it. This paper presents a FDFM (Few DSP slices and Few block RAMs) processor core approach for implementing RSA encryption. In our approach, an efficient hardware algorithm for Chinese Remainder Theorem (CRT) based RSA decryption using Montgomery multiplication algorithm is implemented. Our hardware algorithm supporting up-to 2048-bit RSA decryption is designed to be implemented using one DSP slice, one block RAM and few logic blocks in the Xilinx Virtex-6 FPGA. The implementation results show that our RSA core for 1024-bit RSA decryption runs in 13.74ms. Quite surprisingly, the multiplier in the DSP slice used to compute Montgomery multiplication works in more than 95% clock cycles during the processing. Hence, our implementation is close to optimal in the sense that it has only less than 5% overhead in multiplication and no further improvement is possible as long as CRT-based Montgomery multiplication based algorithm is applied. We have also succeeded in implementing 320 RSA decryption cores in one Xilinx Virtex-6 FPGA XC6VLX240T-1 which work in parallel. The implemented parallel 320 RSA cores achieve 23.03 Mbits/s throughput for 1024-bit RSA decryption.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call