Abstract

KyberKEM is one of the final round key encapsulation mechanisms in the NIST post-quantum cryptography competition. Number theoretic transform (NTT), as the computing bottleneck of KyberKEM, has been widely studied. Discrete Galois Transformation (DGT) is a variant of NTT that reduces transform length into half but requires more multiplication operations than the latest NTT algorithm in theoretical analysis. This paper proposes the split-radix DGT, a novel DGT variant utilizing the split-radix method, to reduce the computing complexity without compromising the transform length. Specifically, for length-128 polynomial, the split-radix DGT algorithm saves at least 10% multiplication operations compared with the latest NTT algorithm in theoretical analysis. Furthermore, we proposed a unified split-radix DGT processor with the dedicated stream permutation network for KyberKEM and implemented it on the Xilinx Artix-7 FPGA. The processor achieves at least 49.4% faster transformation and 65.3% faster component-wise multiplication, with at most 87% and 32% LUT-NTT area-time product and LUT-CWM area-time product, compared with the state-of-the-art polynomial multipliers in KyberKEM with the same BFU setting on similar platforms. Lastly, we designed a highly efficient KyberKEM architecture using the proposed split-radix DGT processor. The implementation results on Artix-7 FPGA show significant performance improvements over the state-of-the-art KyberKEM designs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call