Recently, NIST has identified the first four quantum-resistant algorithms for post-quantum cryptography (PQC) standardization. CRYSTALS-Kyber (Kyber) is the only public-key encryption and key-establishment algorithm among them. In this paper, we propose a reconfigurable, high-speed, and area-efficient polynomial multiplication accelerator for Kyber to facilitate its practical applications. The cornerstone of polynomial multiplication is the butterfly-unit (BU) structure, composed of modular addition, subtraction, and multiplication. For the modular multiplication, we adopt the Barrett reduction method and reduce the size of operands leveraging the form of modulus with a novel formula transformation, which significantly decreases the computational complexity and increases the maximum clock frequency. On the hardware side, we make 4 BU modules constitute a binomial arithmetic core (Bi-Core) as the basic reconfigurable unit. The memory access scheme tailored for parallel processing is explored with data-reusing and memory-grouping methods, and a compact control logic is devised. The complete polynomial multiplication architecture is coded with Verilog and implemented on a Xilinx Artix-7 xc7a100t-3 device. Experiment results demonstrate that our implementations with different configurations all outperform the state-of-the-art works in area efficiency by up to 39% improvement in terms of area-time product (ATP). Moreover, the proposed design with 4 Bi-Cores achieves the fastest speed among existing designs.
Read full abstract