Abstract

In this article, we present the first GPU implementation for FrodoKEM-976 , NewHope-1024, and Kyber-1024 . These algorithms belong to three different classes of post-quantum algorithms: Learning with errors (LWE), Ring-LWE, and Module-LWE. We show the practical applicability of the algorithms in different scenarios using two different implementation approaches. Moreover, we achieve highly efficient realization of computationally expensive operations such as ${\sf NTT}$ NTT (Number Theoretic Transform), matrix multiplication, and Keccak. Since, these are the most common operations in lattice-based cryptographic algorithms, the techniques presented in this article will likely benefit other similar algorithms. Using a ${\sf GV100}$ NVIDIA QUADRO GV 100 graphics card, we undertook a detailed experimental study. For NewHope and Kyber we were able to perform approximately 504K and 473K key exchanges per second, demonstrating a speedup of almost 53.1× and 51.05× compared to the reference C implementation. Compared to the optimized ${\sf AVX2}$ AVX 2 versions we obtain speedups of 25.7× and 14.6×, respectively. Further, implementation of FrodoKEM resulted in a speedup of 50.6×, 44.2×, and 36.9× for ${\sf KeyGen}$ KeyGen , ${\sf Encaps}$ Encaps and ${\sf Decaps}$ Decaps operations. Compared to its ${\sf AVX2}$ AVX 2 counterpart, we achieved a speedup of about 7.3×, 4.7× and 4.9×, respectively. We also show that using multiple streams resulted in further speedup of about 28–38 percent.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call