Since the introduction of the Supersingular isogeny Diffie–Hellman (SIDH) key exchange protocol by Jao and de Feo in 2011, it and its variation (SIKE) have gained significant attention as a promising candidate for post-quantum cryptography (PQC). Until now, even though several implementations of the state-of-the-art SIKE mechanism were presented on CPUs and embedded MCUs, there was no consideration of implementing SIKE on parallel graphic processing units (GPUs). With the advent of the IoT era, a number of IT devices will communicate with application servers. Thus, developing efficient instance of SIKE on server sides is also important. GPUs have been considered as a promising candidate for a cryptographic accelerator. In this paper, we present an efficient implementation of Supersingular Isogeny Key Encapsulation (SIKE) mechanism on GPUs. Even though SIKE has fascinating advantages of much smaller key and ciphertext sizes compared with other NIST PQC candidates, its computational overhead is extremely high. Until now, a large amount of research has been conducted for enhancing the performance of SIKE with respect to software on typical CPU and embedded MCUs and hardware optimization on ASIC and FPGA. However, generic software optimization utilizing GPUs has not been considered yet. We target the GPU implementation of SIKEp503 security parameters which provides the security level 2 (At least as difficult to break as SHA256). For efficiency, we optimize the underlying field arithmetic, especially field multiplication and reduction over $p503=2^{250}3^{159}-1$ and take full advantage of the properties of GPU architecture including memory hierarchy. The proposed GPU software based on RTX2080Ti provides around 36376.61 KeyGens/s, 25603.72 Encaps/s, and 22211.61 Decaps/s. These are about 140.64, 157.66, and 146.81 times of improvements to the SIKE CPU Software on Intel i9-10900K CPU, respectively. As far as we know, this is the first efficient implementation of SIKE software on GPU side.
Read full abstract