Abstract
Lattice-based cryptography has been growing in demand due to their quantum attack resiliency. Polynomial multiplication is a major computational bottleneck of lattice cryptosystems. To address the challenge, lattice-based cryptosystems use the Number Theoretic Transform (NTT). Although NTT reduces complexity, it is still a well-known computational bottleneck. At the same time, NTT arithmetic needs vary for different algorithms, motivating flexible solutions. Although there are prior hardware and software NTT designs, they do not simultaneously offer flexibility and efficiency. This work provides an efficient and flexible NTT solution through domain-specific architectural support on RISC-V. Rather than using instruction-set extensions with compiler modifications or loosely coupling a RISC-V core with an NTT co-processor, our proposal uses application-specific dynamic instruction scheduling, memory dependence prediction, and datapath optimizations. This allows achieving a direct translation of C code to optimized NTT executions. We demonstrate the flexibility of our approach by implementing the NTT used in several lattice-based cryptography protocols: NewHope, qTESLA, CRYSTALS-Kyber, CRYSTALS-Dilithium, and Falcon. The results on the FPGA technology show that the proposed design is respectively 6x, 40x, and 3x more efficient than the baseline solution, Berkeley Out-of-Order Machine, and a prior HW/SW co-design, while providing the needed flexibility.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have